commit faaf2158d45f1715395d1007361b04ca0a912b83 Author: 医疗报告项目 <984523799@qq.com> Date: Fri Feb 13 18:32:52 2026 +0800 初始化医疗报告生成项目,添加核心代码文件 diff --git a/.gitignore b/.gitignore new file mode 100644 index 0000000..cad7777 --- /dev/null +++ b/.gitignore @@ -0,0 +1,45 @@ +# 忽略报告文件 +reports/ + +# 忽略缓存文件 +__pycache__/ + +# 忽略环境变量文件 +.env + +# 忽略日志文件 +*.log + +# 忽略Excel文件 +*.xlsx + +# 忽略PDF文件 +*.pdf + +# 忽略临时文件 +*.tmp +*.temp + +# 忽略编译文件 +*.pyc + +# 忽略IDE相关文件 +.vscode/ +.idea/ + +# 忽略前端依赖 +frontend/node_modules/ + +# 忽略系统文件 +Thumbs.db +.DS_Store + +# 忽略调试相关文件 +debug_* + +# 忽略缓存文件 +deepseek_cache.json + +# 忽略模板文件(如果有多个模板,可以根据需要调整) +Be.U Wellness Center功能医学健康报告&定制化方案-案例.docx +Be.U+Wellness+Center功能医学健康报告&amp;定制化方案-+Ming+Wang.docx diff --git a/.kiro/specs/medical-report-generator/requirements.md b/.kiro/specs/medical-report-generator/requirements.md new file mode 100644 index 0000000..18de0c9 --- /dev/null +++ b/.kiro/specs/medical-report-generator/requirements.md @@ -0,0 +1,133 @@ +# Requirements Document + +## Introduction + +医疗报告智能生成系统(Medical Report Generator)是一个自动化工具,用于从医疗检测PDF报告中提取数据,通过AI分析生成健康评估和建议,并填充到专业的Word模板中生成完整的功能医学健康报告。 + +系统支持OCR识别、DeepSeek AI分析、模板保护区域管理、自动分页符处理等功能。 + +## Glossary + +- **Report_Generator**: 医疗报告生成系统的核心模块 +- **OCR_Service**: 百度OCR服务,用于从PDF中提取文本和数据 +- **DeepSeek_Service**: DeepSeek AI服务,用于分析异常指标并生成健康评估内容 +- **Template_Service**: Word模板处理服务,负责填充数据和格式化 +- **Protected_Region**: 保护区域,指模板前四页(客户健康方案之前)的内容,不应被修改 +- **Health_Program_Boundary**: 保护边界,"客户健康方案/Client Health Program"在文档中的位置 +- **Module_Title**: 模块标题,如"血液学检测"、"内分泌检测"等分类标题 +- **ABB**: 检测项目的缩写标识符 +- **Clinical_Significance**: 临床意义,对检测结果的医学解释 + +## Requirements + +### Requirement 1: PDF数据提取 + +**User Story:** As a 医疗报告操作员, I want to 从PDF医疗检测报告中自动提取数据, so that 我可以快速获取检测结果而无需手动输入。 + +#### Acceptance Criteria + +1. WHEN 用户提供PDF医疗报告文件 THEN THE OCR_Service SHALL 使用百度OCR识别并提取所有文本内容 +2. WHEN PDF包含检测数据表格 THEN THE OCR_Service SHALL 提取项目名称、结果值、参考范围、单位和异常标记 +3. WHEN 提取完成 THEN THE Report_Generator SHALL 将数据保存为JSON格式的缓存文件 +4. IF 缓存文件已存在且未强制刷新 THEN THE Report_Generator SHALL 使用缓存数据而非重新提取 + +### Requirement 2: 数据匹配与映射 + +**User Story:** As a 系统管理员, I want to 将提取的数据与模板字段进行匹配, so that 数据可以正确填充到报告模板中。 + +#### Acceptance Criteria + +1. WHEN 数据提取完成 THEN THE Report_Generator SHALL 根据ABB配置文件将检测项目映射到对应的模板字段 +2. WHEN 检测项目有异常标记(↑、↓、H、L) THEN THE Report_Generator SHALL 正确识别并记录异常状态 +3. WHEN 检测项目无法通过配置文件匹配 THEN THE DeepSeek_Service SHALL 分析并分类该项目所属模块 +4. IF 定性结果与参考范围相同 THEN THE Report_Generator SHALL 不将其标记为异常 + +### Requirement 3: 保护区域管理 + +**User Story:** As a 报告设计师, I want to 保护模板前四页的内容不被修改, so that 公司品牌信息和固定内容保持完整。 + +#### Acceptance Criteria + +1. WHEN 文档处理开始 THEN THE Report_Generator SHALL 动态查找"客户健康方案/Client Health Program"位置作为保护边界 +2. WHILE 处理文档内容 THEN THE Report_Generator SHALL 不修改保护边界之前的任何元素 +3. WHEN 文档处理完成 THEN THE Report_Generator SHALL 从原始模板复制保护区域到输出文件 +4. WHEN 验证输出文件 THEN THE Report_Generator SHALL 确保保护区域的XML元素与模板逐字节匹配 +5. WHEN 验证输出文件 THEN THE Report_Generator SHALL 确保保护区域的媒体文件MD5与模板完全相同 + +### Requirement 4: AI健康评估生成 + +**User Story:** As a 功能医学专家, I want to 根据异常指标自动生成健康评估内容, so that 报告包含专业的医学分析。 + +#### Acceptance Criteria + +1. WHEN 存在异常检测指标 THEN THE DeepSeek_Service SHALL 收集所有异常项并生成"整体健康状况评估"内容 +2. WHEN 生成健康评估 THEN THE DeepSeek_Service SHALL 根据异常类型自动分成合适的小节(血液学、内分泌、免疫、代谢等) +3. WHEN 生成内容 THEN THE DeepSeek_Service SHALL 先写英文内容,然后逐句翻译为中文 +4. WHEN 存在异常检测指标 THEN THE DeepSeek_Service SHALL 生成"功能医学健康建议"内容 +5. WHEN 生成健康建议 THEN THE DeepSeek_Service SHALL 包含营养干预、运动干预、睡眠与压力管理、生活方式调整、长期随访计划五个固定小节 + +### Requirement 5: 临床意义解释生成 + +**User Story:** As a 医疗报告阅读者, I want to 看到每个检测项目的临床意义解释, so that 我可以理解检测结果的医学含义。 + +#### Acceptance Criteria + +1. WHEN 添加新的检测项目表格 THEN THE DeepSeek_Service SHALL 为该项目生成Clinical Significance解释 +2. WHEN 生成临床意义 THEN THE DeepSeek_Service SHALL 同时提供英文和中文版本 +3. WHEN 填充临床意义 THEN THE Template_Service SHALL 使用正确的字体样式(英文Times New Roman 10.5pt,中文宋体12pt) + +### Requirement 6: 文档格式化与分页 + +**User Story:** As a 报告阅读者, I want to 报告具有清晰的分页和格式, so that 内容易于阅读和打印。 + +#### Acceptance Criteria + +1. WHEN 处理保护区域之后的图片 THEN THE Report_Generator SHALL 在每个图片前插入分页符 +2. WHEN 处理模块标题 THEN THE Report_Generator SHALL 在模块标题前插入分页符(第一个模块除外) +3. WHEN 识别模块标题 THEN THE Report_Generator SHALL 排除长度超过50字符的文本 +4. WHEN 识别模块标题 THEN THE Report_Generator SHALL 排除以"因此"、"所以"、"综上"开头的描述性文字 +5. WHEN 识别模块标题 THEN THE Report_Generator SHALL 排除包含句号、逗号等标点的长句子 + +### Requirement 7: 表格创建与填充 + +**User Story:** As a 报告生成系统, I want to 为缺失的检测项目创建标准格式的表格, so that 所有检测数据都能正确显示。 + +#### Acceptance Criteria + +1. WHEN 检测项目在模板中没有对应位置 THEN THE Template_Service SHALL 在对应模块内创建新表格 +2. WHEN 创建表格 THEN THE Template_Service SHALL 包含ABB、项目名、结果、指标、参考范围、单位列 +3. WHEN 创建表格 THEN THE Template_Service SHALL 包含Clinical Significance合并行 +4. WHEN 设置表格边框 THEN THE Template_Service SHALL 使用顶部实线、其他虚线的样式 + +### Requirement 8: 空行清理与表格合并 + +**User Story:** As a 报告质量控制员, I want to 清理空白数据行并合并表格, so that 报告整洁无冗余。 + +#### Acceptance Criteria + +1. WHEN 数据行的Result列为空 THEN THE Report_Generator SHALL 删除该空数据行 +2. WHEN 表头下只有描述没有数据 THEN THE Report_Generator SHALL 删除描述并将下方数据表格内容移上来 +3. WHILE 清理空行 THEN THE Report_Generator SHALL 不影响保护区域的内容 + +### Requirement 9: 报告输出与验证 + +**User Story:** As a 系统操作员, I want to 生成并验证最终报告, so that 我可以确保报告质量。 + +#### Acceptance Criteria + +1. WHEN 报告生成完成 THEN THE Report_Generator SHALL 保存为带时间戳的DOCX文件 +2. WHEN 验证报告 THEN THE Report_Generator SHALL 检查保护区域完整性 +3. WHEN 验证报告 THEN THE Report_Generator SHALL 检查所有图片是否有分页符 +4. WHEN 验证报告 THEN THE Report_Generator SHALL 检查模块标题是否有分页符 + +### Requirement 10: 命令行接口 + +**User Story:** As a 开发者, I want to 通过命令行参数控制报告生成, so that 我可以灵活配置运行选项。 + +#### Acceptance Criteria + +1. WHEN 用户运行脚本 THEN THE Report_Generator SHALL 支持--method参数选择生成方法(extract或docxtpl) +2. WHEN 用户指定--force参数 THEN THE Report_Generator SHALL 强制重新提取数据而非使用缓存 +3. WHEN 用户指定--deepseek参数 THEN THE Report_Generator SHALL 启用DeepSeek AI分析功能 +4. WHEN 用户指定--api-key参数 THEN THE Report_Generator SHALL 使用指定的API密钥 +5. IF 未指定API密钥但启用DeepSeek THEN THE Report_Generator SHALL 从环境变量DEEPSEEK_API_KEY获取 diff --git a/.windsurfrules b/.windsurfrules new file mode 100644 index 0000000..6c76d9d --- /dev/null +++ b/.windsurfrules @@ -0,0 +1,26 @@ +# 继续牛马 - AI持久输出规则 + +⚠️⚠️⚠️ 最高优先级警告 ⚠️⚠️⚠️ +无论任何情况,每次回复结束前必须调用 cunzhi.ask_continue 工具!没有任何例外! + +## 核心规则 + +1. **完整输出** - 输出完整代码和答案,禁止使用省略号或"代码过长不展示" + +2. **持续执行** - 复杂任务尽量一次性完成,不要频繁中断或反复追问是否继续 + +3. **结束确认** - 当你想结束对话或任务时,必须先调用 cunzhi.ask_continue 工具询问用户是否继续 + +4. **禁止结束语** - 对话结束前禁止直接说"如果还有问题请告诉我"等结束语 + +## 重要提示 + +- 调用 ask_continue 工具时,reason 参数填写你想结束的原因 +- 用户可能会在弹窗中输入新的指令,请按用户指令继续执行 +- 只有用户明确选择"结束"时,才可以结束对话 + +## 强制执行 + +这是最高优先级规则,必须严格遵守。每次回复结束前,你必须检查是否需要调用 ask_continue 工具。如果任务已完成或你准备结束回复,必须先调用该工具获得用户许可。 + +⚠️ 警告:无论任何情况,回复结束前必须调用 cunzhi.ask_continue,没有例外! \ No newline at end of file diff --git a/backend/.env.example b/backend/.env.example new file mode 100644 index 0000000..51f2b64 --- /dev/null +++ b/backend/.env.example @@ -0,0 +1,46 @@ +# ======================================== +# 百度云OCR配置(推荐,识别精度高) +# ======================================== +# 获取方式:https://cloud.baidu.com/product/ocr +# 1. 注册百度智能云账号 +# 2. 创建文字识别应用 +# 3. 获取 AppID、API Key、Secret Key +BAIDU_OCR_APP_ID=your-app-id +BAIDU_OCR_API_KEY=your-api-key +BAIDU_OCR_SECRET_KEY=your-secret-key + +# ======================================== +# LLM配置(选择其一即可) +# ======================================== + +# Coze 工作流配置(推荐,两阶段处理:Coze分析 + Ollama优化) +# 获取方式:https://www.coze.cn +# 1. 注册 Coze 账号并创建工作流 +# 2. 获取 API Key 和 Workflow ID +# 3. 配置以下参数启用两阶段处理(Coze → Ollama) +# 注意:使用文本输入模式,OCR提取文本后传给Coze工作流 +COZE_API_KEY=your-coze-api-key +COZE_WORKFLOW_ID=7574271851028217908 +COZE_API_URL=https://api.coze.cn/v1/workflow/run +COZE_MAX_RETRIES=3 + +# OpenAI 配置 +OPENAI_API_KEY=your-openai-api-key-here +OPENAI_MODEL=gpt-3.5-turbo + +# Ollama 本地模型配置(必需,用于优化Coze结果或独立使用) +OLLAMA_HOST=http://localhost:11434 +OLLAMA_MODEL=qwen2.5:7b + +# ======================================== +# 说明 +# ======================================== +# 1. 复制此文件为 .env 并填入实际配置 +# 2. OCR引擎:MinerU(高精度文档解析) +# 3. LLM处理流程: +# - 如果配置 Coze:两阶段处理(Coze分析 + Ollama优化) +# - 如果只配置 Ollama:单阶段处理(直接使用Ollama) +# - 优先级:OpenAI > Coze > Ollama > 模拟模式 +# 4. 批量报告功能: +# - 上传多个PDF → OCR提取文本 → LLM综合分析 → 生成Be.U风格PDF报告 +# - 使用xhtml2pdf生成PDF(无需额外依赖) diff --git a/backend/_check_report.py b/backend/_check_report.py new file mode 100644 index 0000000..51bd0c4 --- /dev/null +++ b/backend/_check_report.py @@ -0,0 +1,23 @@ +from docx import Document + +report_path = r'C:\Users\UI\Desktop\医疗报告\backend\reports\filled_report_20260212_154247.docx' +doc = Document(report_path) +body = doc.element.body +children = list(body) + +keywords = ['overall health', '整体健康', 'medical intervention', '医学干预', + 'functional medical health', '功能医学健康建议', + 'nutrition intervention', '营养干预', 'exercise intervention', '运动干预', + 'sleep', '睡眠', 'lifestyle', '生活方式', 'long-term', '长期随访', + '功能医学检测档案', 'abnormal index', '异常指标'] + +print(f"文档总元素数: {len(children)}") +print("=" * 80) + +for i, elem in enumerate(children): + text = ''.join(elem.itertext()).strip() + if text: + text_lower = text.lower() + if any(kw in text_lower for kw in keywords): + tag = elem.tag.split("}")[-1] + print(f'[{i}] {tag}: {text[:150]}') diff --git a/backend/abb_mapping_config.json b/backend/abb_mapping_config.json new file mode 100644 index 0000000..1e637af --- /dev/null +++ b/backend/abb_mapping_config.json @@ -0,0 +1,529 @@ +{ + "description": "基于2.pdf提取的标准检测项目ABB映射配置 - 按PDF页面顺序排列", + "version": "2.0", + "last_updated": "2026-01-09", + "total_modules": 24, + "modules": { + "Urine Test": { + "cn_name": "尿液检测", + "pages": "16-19", + "order": 1, + "items": [ + {"abb": "Color", "project": "Color", "project_cn": "颜色"}, + {"abb": "pH", "project": "pH", "project_cn": "酸碱度"}, + {"abb": "TuR", "project": "Turbidity", "project_cn": "浊度"}, + {"abb": "PRO", "project": "Protein", "project_cn": "蛋白质"}, + {"abb": "BLD", "project": "Occult Blood", "project_cn": "隐血或红细胞"}, + {"abb": "GLU", "project": "Glucose", "project_cn": "糖"}, + {"abb": "SG", "project": "Specific Gravity", "project_cn": "比重"}, + {"abb": "LEU", "project": "Leucocyte", "project_cn": "白细胞"}, + {"abb": "NIT", "project": "Nitrite", "project_cn": "亚硝酸盐"}, + {"abb": "KET", "project": "Ketone", "project_cn": "酮体"} + ] + }, + "Complete Blood Count": { + "cn_name": "血常规", + "pages": "20-26", + "order": 2, + "items": [ + {"abb": "RBC", "project": "RBC Count", "project_cn": "红细胞计数"}, + {"abb": "Hb", "project": "Hemoglobin", "project_cn": "血红蛋白"}, + {"abb": "HCT", "project": "Hematocrit", "project_cn": "红细胞压积"}, + {"abb": "MCV", "project": "Mean Corpuscular Volume", "project_cn": "平均红细胞体积"}, + {"abb": "MCH", "project": "Mean Corpuscular Hemoglobin", "project_cn": "平均红细胞血红蛋白含量"}, + {"abb": "MCHC", "project": "Mean Corpuscular Hemoglobin Concentration", "project_cn": "平均红细胞血红蛋白浓度"}, + {"abb": "RDW", "project": "Red Cell Distribution Width", "project_cn": "红细胞体积分布宽度"}, + {"abb": "RBC Morphology", "project": "RBC Morphology", "project_cn": "红细胞形态"}, + {"abb": "WBC count", "project": "WBC Count", "project_cn": "白细胞总数"}, + {"abb": "NEUT", "project": "Neutrophil Count", "project_cn": "中性粒细胞数量"}, + {"abb": "NEUT%", "project": "Neutrophil Percentage", "project_cn": "中性粒细胞百分含量"}, + {"abb": "EOS", "project": "Eosinophil Count", "project_cn": "嗜酸细胞数量"}, + {"abb": "EOS%", "project": "Eosinophil Percentage", "project_cn": "嗜酸细胞百分含量"}, + {"abb": "BAS", "project": "Basophil Count", "project_cn": "嗜碱细胞数量"}, + {"abb": "BAS%", "project": "Basophil Percentage", "project_cn": "嗜碱细胞百分含量"}, + {"abb": "LYMPH", "project": "Lymphocyte Count", "project_cn": "淋巴细胞数量"}, + {"abb": "LYMPH%", "project": "Lymphocyte Percentage", "project_cn": "淋巴细胞百分含量"}, + {"abb": "MONO", "project": "Monocyte Count", "project_cn": "单核细胞数量"}, + {"abb": "MONO%", "project": "Monocyte Percentage", "project_cn": "单核细胞百分含量"}, + {"abb": "PLT", "project": "Platelet Count", "project_cn": "血小板计数"}, + {"abb": "PCT", "project": "Plateletcrit", "project_cn": "血小板压积"}, + {"abb": "MPV", "project": "Mean Platelet Volume", "project_cn": "平均血小板体积"}, + {"abb": "PDW", "project": "Platelet Distribution Width", "project_cn": "血小板分布宽度"} + ] + }, + "Blood Sugar": { + "cn_name": "血糖", + "pages": "27-28", + "order": 3, + "items": [ + {"abb": "FBS", "project": "Fasting Blood Sugar", "project_cn": "空腹血糖"}, + {"abb": "HbA1C", "project": "Glycated Hemoglobin", "project_cn": "糖化血红蛋白"} + ] + }, + "Lipid Profile": { + "cn_name": "血脂", + "pages": "29-31", + "order": 4, + "items": [ + {"abb": "TC", "project": "Total Cholesterol", "project_cn": "总胆固醇"}, + {"abb": "TG", "project": "Triglycerides", "project_cn": "甘油三酯"}, + {"abb": "Lp(a)", "project": "Lipoprotein(a)", "project_cn": "脂蛋白(a)"}, + {"abb": "HDL", "project": "HDL Cholesterol", "project_cn": "高密度脂蛋白"}, + {"abb": "LDL", "project": "LDL Cholesterol", "project_cn": "低密度脂蛋白"} + ] + }, + "Blood Type": { + "cn_name": "血型", + "pages": "32-33", + "order": 5, + "items": [ + {"abb": "Blood type", "project": "ABO Blood Type", "project_cn": "血型"}, + {"abb": "Blood type RH", "project": "Rh Blood Type", "project_cn": "RH血型"} + ] + }, + "Blood Coagulation": { + "cn_name": "凝血功能", + "pages": "34-36", + "order": 6, + "items": [ + {"abb": "PT", "project": "Prothrombin Time", "project_cn": "凝血酶原时间"}, + {"abb": "PT%", "project": "Prothrombin Activity", "project_cn": "凝血酶原活动度"}, + {"abb": "APTT", "project": "Activated Partial Thromboplastin Time", "project_cn": "活化部分凝血活酶时间"}, + {"abb": "TT", "project": "Thrombin Time", "project_cn": "凝血酶时间"}, + {"abb": "FIB", "project": "Fibrinogen", "project_cn": "纤维蛋白原"}, + {"abb": "INR", "project": "International Normalized Ratio", "project_cn": "国际标准化比值"} + ] + }, + "Four Infectious Diseases": { + "cn_name": "传染病四项", + "pages": "37-40", + "order": 7, + "items": [ + {"abb": "HIV", "project": "HIV Antibody", "project_cn": "人类免疫缺陷病毒抗体"}, + {"abb": "TRUST", "project": "Toluidine Red Unheated Serum Test", "project_cn": "梅毒甲苯胺红不加热血清试验"}, + {"abb": "TPPA", "project": "Treponema Pallidum Particle Agglutination", "project_cn": "梅毒螺旋体特异性抗体定性"}, + {"abb": "HBsAg", "project": "Hepatitis B Surface Antigen", "project_cn": "乙肝表面抗原"}, + {"abb": "HBsAb", "project": "Hepatitis B Surface Antibody", "project_cn": "乙肝表面抗体"}, + {"abb": "HBeAg", "project": "Hepatitis B e Antigen", "project_cn": "乙肝E抗原"}, + {"abb": "HBeAb", "project": "Hepatitis B e Antibody", "project_cn": "乙肝E抗体"}, + {"abb": "HBcAb", "project": "Hepatitis B Core Antibody", "project_cn": "乙肝核心抗体"}, + {"abb": "HCV-IgM", "project": "Hepatitis C Virus IgM Antibody", "project_cn": "丙型肝炎病毒抗体IgM"} + ] + }, + "Serum Electrolytes": { + "cn_name": "血电解质", + "pages": "41-43", + "order": 8, + "items": [ + {"abb": "K", "project": "Potassium", "project_cn": "钾"}, + {"abb": "Na", "project": "Sodium", "project_cn": "钠"}, + {"abb": "Cl", "project": "Chloride", "project_cn": "氯"}, + {"abb": "Ca", "project": "Calcium", "project_cn": "钙"}, + {"abb": "Mg", "project": "Magnesium", "project_cn": "镁"}, + {"abb": "P", "project": "Phosphorus", "project_cn": "磷"} + ] + }, + "Liver Function": { + "cn_name": "肝功能", + "pages": "44-47", + "order": 9, + "items": [ + {"abb": "TP", "project": "Total Protein", "project_cn": "总蛋白"}, + {"abb": "ALB", "project": "Albumin", "project_cn": "白蛋白"}, + {"abb": "GLB", "project": "Globulin", "project_cn": "球蛋白"}, + {"abb": "A/G", "project": "Albumin/Globulin Ratio", "project_cn": "白蛋白/球蛋白"}, + {"abb": "TBil", "project": "Total Bilirubin", "project_cn": "总胆红素"}, + {"abb": "DBil", "project": "Direct Bilirubin", "project_cn": "直接胆红素"}, + {"abb": "IBil", "project": "Indirect Bilirubin", "project_cn": "间接胆红素"}, + {"abb": "ALP", "project": "Alkaline Phosphatase", "project_cn": "碱性磷酸酶"}, + {"abb": "ALT", "project": "Alanine Aminotransferase", "project_cn": "丙氨酸氨基转移酶"}, + {"abb": "AST", "project": "Aspartate Aminotransferase", "project_cn": "天门冬氨酸氨基转移酶"}, + {"abb": "GGT", "project": "Gamma-Glutamyl Transferase", "project_cn": "γ-谷氨酰转移酶"} + ] + }, + "Kidney Function": { + "cn_name": "肾功能", + "pages": "48-49", + "order": 10, + "items": [ + {"abb": "Scr", "project": "Serum Creatinine", "project_cn": "血清肌酐"}, + {"abb": "BUN", "project": "Blood Urea Nitrogen", "project_cn": "血尿素氮"}, + {"abb": "UA", "project": "Uric Acid", "project_cn": "尿酸"} + ] + }, + "Myocardial Enzyme": { + "cn_name": "心肌酶谱", + "pages": "50-51", + "order": 11, + "items": [ + {"abb": "CK", "project": "Creatine Kinase", "project_cn": "肌酸激酶"}, + {"abb": "LDH", "project": "Lactate Dehydrogenase", "project_cn": "乳酸脱氢酶"}, + {"abb": "CK-MB", "project": "Creatine Kinase-MB", "project_cn": "肌酸激酶同工酶"} + ] + }, + "Thyroid Function": { + "cn_name": "甲状腺功能", + "pages": "52-54", + "order": 12, + "items": [ + {"abb": "T3", "project": "Triiodothyronine", "project_cn": "三碘甲状腺原氨酸"}, + {"abb": "T4", "project": "Thyroxine", "project_cn": "甲状腺素"}, + {"abb": "FT3", "project": "Free Triiodothyronine", "project_cn": "游离三碘甲状腺原氨酸"}, + {"abb": "FT4", "project": "Free Thyroxine", "project_cn": "游离甲状腺素"}, + {"abb": "TSH", "project": "Thyroid Stimulating Hormone", "project_cn": "促甲状腺激素"}, + {"abb": "TgAb", "project": "Thyroglobulin Antibody", "project_cn": "甲状腺球蛋白抗体"} + ] + }, + "Thromboembolism": { + "cn_name": "心脑血管风险因子", + "pages": "55-56", + "order": 13, + "items": [ + {"abb": "Hcy", "project": "Homocysteine", "project_cn": "同型半胱氨酸"}, + {"abb": "D-Dimer", "project": "D-Dimer", "project_cn": "D-二聚体"} + ] + }, + "Bone Metabolism": { + "cn_name": "骨代谢", + "pages": "57-59", + "order": 14, + "items": [ + {"abb": "25-OH-VD2+D3", "project": "25-Hydroxyvitamin D2+D3", "project_cn": "25-羟基维生素D2+D3"}, + {"abb": "PTH", "project": "Parathyroid Hormone", "project_cn": "甲状旁腺激素"}, + {"abb": "OST", "project": "Osteocalcin", "project_cn": "骨钙素"}, + {"abb": "TPINP", "project": "Total Procollagen Type 1 N-terminal Propeptide", "project_cn": "总I型胶原氨基端延长肽"}, + {"abb": "β-CTX", "project": "Beta-CrossLaps", "project_cn": "β-胶原降解产物"} + ] + }, + "Microelement": { + "cn_name": "微量元素", + "pages": "60-62", + "order": 15, + "items": [ + {"abb": "Pb", "project": "Lead", "project_cn": "全血微量元素铅"}, + {"abb": "Cu", "project": "Copper", "project_cn": "全血微量元素铜"}, + {"abb": "Zn", "project": "Zinc", "project_cn": "全血微量元素锌"}, + {"abb": "Mg", "project": "Magnesium", "project_cn": "全血微量元素镁"}, + {"abb": "Fe", "project": "Iron", "project_cn": "全血微量元素铁"} + ] + }, + "Lymphocyte Subpopulation": { + "cn_name": "淋巴细胞亚群", + "pages": "63-64", + "order": 16, + "items": [ + {"abb": "CD3+", "project": "T Lymphocyte", "project_cn": "T淋巴细胞"}, + {"abb": "CD4+", "project": "Helper T Cell", "project_cn": "辅助T细胞"}, + {"abb": "CD8+", "project": "Cytotoxic T Cell", "project_cn": "细胞毒性T细胞"} + ] + }, + "Humoral Immunity": { + "cn_name": "体液免疫", + "pages": "65-67", + "order": 17, + "items": [ + {"abb": "IgG", "project": "Immunoglobulin G", "project_cn": "免疫球蛋白G"}, + {"abb": "IgA", "project": "Immunoglobulin A", "project_cn": "免疫球蛋白A"}, + {"abb": "IgM", "project": "Immunoglobulin M", "project_cn": "免疫球蛋白M"}, + {"abb": "IgE", "project": "Immunoglobulin E", "project_cn": "免疫球蛋白E"}, + {"abb": "C3", "project": "Complement C3", "project_cn": "补体C3"}, + {"abb": "C4", "project": "Complement C4", "project_cn": "补体C4"} + ] + }, + "Inflammatory Reaction": { + "cn_name": "炎症反应", + "pages": "68-69", + "order": 18, + "items": [ + {"abb": "CRP", "project": "C-Reactive Protein", "project_cn": "C反应蛋白"}, + {"abb": "hs-CRP", "project": "High-Sensitivity C-Reactive Protein", "project_cn": "超敏C反应蛋白"}, + {"abb": "ESR", "project": "Erythrocyte Sedimentation Rate", "project_cn": "红细胞沉降率"}, + {"abb": "ASO", "project": "Anti-Streptolysin O", "project_cn": "抗链球菌溶血素O"} + ] + }, + "Autoantibody": { + "cn_name": "自身抗体", + "pages": "70-71", + "order": 19, + "items": [ + {"abb": "ANA", "project": "Antinuclear Antibody", "project_cn": "抗核抗体"}, + {"abb": "RF", "project": "Rheumatoid Factor", "project_cn": "类风湿因子"} + ] + }, + "Female Hormone": { + "cn_name": "女性荷尔蒙", + "pages": "72-75", + "order": 20, + "items": [ + {"abb": "E2", "project": "Estradiol", "project_cn": "雌二醇"}, + {"abb": "PROG", "project": "Progesterone", "project_cn": "孕酮"}, + {"abb": "FSH", "project": "Follicle Stimulating Hormone", "project_cn": "促卵泡激素"}, + {"abb": "LH", "project": "Luteinizing Hormone", "project_cn": "促黄体生成素"}, + {"abb": "PRL", "project": "Prolactin", "project_cn": "垂体催乳素"}, + {"abb": "T", "project": "Testosterone", "project_cn": "睾酮"}, + {"abb": "DHEAS", "project": "Dehydroepiandrosterone Sulfate", "project_cn": "脱氢表雄酮硫酸酯"}, + {"abb": "COR", "project": "Cortisol", "project_cn": "皮质醇"}, + {"abb": "IGF-1", "project": "Insulin-like Growth Factor 1", "project_cn": "胰岛素样生长因子1"}, + {"abb": "AMH", "project": "Anti-Mullerian Hormone", "project_cn": "人抗缪勒氏管激素"} + ] + }, + "Male Hormone": { + "cn_name": "男性荷尔蒙", + "pages": "76-79", + "order": 21, + "items": [ + {"abb": "T", "project": "Testosterone", "project_cn": "睾酮"}, + {"abb": "DHEAS", "project": "Dehydroepiandrosterone Sulfate", "project_cn": "脱氢表雄酮硫酸酯"}, + {"abb": "IGF-1", "project": "Insulin-like Growth Factor 1", "project_cn": "胰岛素样生长因子1"}, + {"abb": "PROG", "project": "Progesterone", "project_cn": "孕酮"}, + {"abb": "FSH", "project": "Follicle Stimulating Hormone", "project_cn": "促卵泡激素"}, + {"abb": "LH", "project": "Luteinizing Hormone", "project_cn": "促黄体生成素"}, + {"abb": "PRL", "project": "Prolactin", "project_cn": "垂体催乳素"}, + {"abb": "Cortisol", "project": "Cortisol", "project_cn": "皮质醇"}, + {"abb": "E2", "project": "Estradiol", "project_cn": "雌二醇"} + ] + }, + "Tumor Markers": { + "cn_name": "肿瘤标记物", + "pages": "80-84", + "order": 22, + "items": [ + {"abb": "AFP", "project": "Alpha-Fetoprotein", "project_cn": "甲胎蛋白"}, + {"abb": "CEA", "project": "Carcinoembryonic Antigen", "project_cn": "癌胚抗原"}, + {"abb": "CA19-9", "project": "Carbohydrate Antigen 19-9", "project_cn": "糖类抗原19-9"}, + {"abb": "Fer", "project": "Ferritin", "project_cn": "铁蛋白"}, + {"abb": "NSE", "project": "Neuron Specific Enolase", "project_cn": "神经元特异性烯醇化酶"}, + {"abb": "Tg", "project": "Thyroglobulin", "project_cn": "甲状腺球蛋白"}, + {"abb": "CT", "project": "Calcitonin", "project_cn": "降钙素"}, + {"abb": "EA-IgA", "project": "EBV Early Antigen IgA", "project_cn": "EB病毒早期抗原IgA抗体"}, + {"abb": "TPSA", "project": "Total Prostate Specific Antigen", "project_cn": "男-总前列腺特异性抗原"}, + {"abb": "FPSA", "project": "Free Prostate Specific Antigen", "project_cn": "男-游离前列腺特异性抗原"}, + {"abb": "F/TPSA", "project": "Free/Total PSA Ratio", "project_cn": "男-游离/总前列腺特异性抗原"}, + {"abb": "CA125", "project": "Cancer Antigen 125", "project_cn": "女-糖类抗原125"}, + {"abb": "CA15-3", "project": "Cancer Antigen 15-3", "project_cn": "女-糖类抗原15-3"}, + {"abb": "SCC", "project": "Squamous Cell Carcinoma Antigen", "project_cn": "女-鳞状细胞癌抗原"} + ] + }, + "Imaging": { + "cn_name": "影像学检查", + "pages": "85-88", + "order": 23, + "items": [ + {"abb": "ECG", "project": "Electrocardiogram", "project_cn": "心电图"}, + {"abb": "Color Doppler Ultrasound", "project": "Color Doppler Ultrasound", "project_cn": "彩色B超检查"}, + {"abb": "CT Examination", "project": "CT Examination", "project_cn": "CT检查"} + ] + }, + "Female-specific": { + "cn_name": "女性专项检查", + "pages": "89-91", + "order": 24, + "items": [ + {"abb": "Gynecological routine inspection", "project": "Gynecological Routine Inspection", "project_cn": "妇科常规检查"}, + {"abb": "Gynecological special examination", "project": "Gynecological Special Examination", "project_cn": "妇科专项检查"} + ] + } + }, + "abb_aliases": { + "TUR": "TuR", + "BLD/ERY": "BLD", + "ERY": "BLD", + "TBIL": "TBil", + "DBIL": "DBil", + "IBIL": "IBil", + "A": "ALB", + "G": "GLB", + "CREA": "Scr", + "Cr": "Scr", + "FPG": "FBS", + "HbA1c": "HbA1C", + "HCY": "Hcy", + "HOMOCYSTEINE": "Hcy", + "25-OH-VitD": "25-OH-VD2+D3", + "VitD": "25-OH-VD2+D3", + "P1NP": "TPINP", + "CTX": "β-CTX", + "B-CTX": "β-CTX", + "OSTE": "OST", + "TESTO": "T", + "DHEA-S": "DHEAS", + "Cortisol": "COR", + "IGF1": "IGF-1", + "Anti-HCV": "HCV-IgM", + "HCV": "HCV-IgM", + "RPR": "TRUST", + "SAPA": "TPPA", + "PSA": "TPSA", + "fPSA": "FPSA", + "CA153": "CA15-3", + "CA199": "CA19-9", + "NES": "NSE", + "E/TPSA": "F/TPSA", + "CD4": "CD4+", + "CD8": "CD8+", + "CD3": "CD3+", + "Rh-D": "Blood type RH", + "RhD": "Blood type RH", + "Rh(D)": "Blood type RH", + "Rh Factor": "Blood type RH", + "Rh": "Blood type RH", + "ABO": "Blood type", + "PT-INR": "INR", + "PT Activity": "PT%", + "PTA": "PT%", + "Chol": "TC", + "CHOL": "TC", + "HDL-C": "HDL", + "LDL-C": "LDL", + "VLDL-C": "VLDL", + "Lpa": "Lp(a)", + "LPA": "Lp(a)", + "PDY": "PDW", + "Total RBC": "RBC", + "RBCt": "RBC", + "CYFRA 21-1": "CYFRA21-1", + "Homocysteine": "Hcy", + "CD16/CD56": "NK", + "NK Cell": "NK", + "B Lymphocyte": "B-Lymph", + "T Lymphocyte": "T-Lymph", + "K+": "K", + "Na+": "Na", + "Cl-": "Cl", + "Ca2+": "Ca", + "Mg2+": "Mg", + "Kalium": "K", + "Sodium": "Na", + "Chloride": "Cl", + "Calcium": "Ca", + "Magnesium": "Mg", + "Phosphorus": "P", + "TOTALRBC": "RBC", + "RBCMORPHOLOGY": "RBC Morphology", + "RBC MORPHOLOGY": "RBC Morphology", + "LP(A)": "Lp(a)", + "FASTINGBLOODSUGAR": "FBS", + "BCTX": "β-CTX", + "C": "Scr", + "Ferritin": "Fer", + "FERRITIN": "Fer", + "MIB": "Hg", + "CIB": "Cd", + "Mn": "Mn", + "Ni": "Ni", + "NAD": "NAD+", + "Food Allergy": "Food Intolerance", + "Allergen": "Inhalant Allergen", + "Turbidity": "TuR", + "NEUT#": "NEUT", + "Neutrophil": "NEUT", + "Neutrophils": "NEUT", + "EOS#": "EOS", + "Eosinophils": "EOS", + "BAS#": "BAS", + "Basophils": "BAS", + "LYMPH#": "LYMPH", + "Lymphocytes": "LYMPH", + "MONO#": "MONO", + "Monocytes": "MONO", + "Mean Cell Hemoglobin": "MCH", + "Mean Cell Hb": "MCH", + "RDW-CV": "RDW", + "RDW-SD": "RDW", + "Plateletcrit": "PCT", + "Prothrombin Time": "PT", + "Thrombin Time": "TT", + "Fibrinogen": "FIB", + "A/G Ratio": "A/G", + "AG Ratio": "A/G", + "Albumin/Globulin": "A/G", + "Uric Acid": "UA", + "URIC": "UA", + "Creatine Kinase": "CK", + "Total T4": "T4", + "Thyroxine": "T4", + "TC/HDL Ratio": "TC/HDL", + "Chol/HDL": "TC/HDL", + "LDL/HDL Ratio": "LDL/HDL", + "Copper": "Cu", + "CU": "Cu", + "Zinc": "Zn", + "ZN": "Zn", + "Iron": "Fe", + "FE": "Fe", + "Anti-Streptolysin": "ASO", + "Antinuclear": "ANA", + "Calcitonin": "CT", + "EBV-IgA": "EA-IgA", + "Anti-Mullerian": "AMH", + "HSCRP": "hs-CRP", + "High Sensitivity CRP": "hs-CRP", + "TgAb": "TgAb", + "TGAB": "TgAb", + "Thyroglobulin Antibody": "TgAb", + "Anti-Thyroglobulin": "TgAb", + "WBC": "WBC count", + "White Blood Cell": "WBC count", + "Total WBC": "WBC count" + }, + "module_aliases": { + "Urine Detection": "Urine Test", + "Urinalysis": "Urine Test", + "Urine Analysis": "Urine Test", + "CBC": "Complete Blood Count", + "Blood Count": "Complete Blood Count", + "Hematology": "Complete Blood Count", + "Glucose": "Blood Sugar", + "Blood Glucose": "Blood Sugar", + "Glycemic": "Blood Sugar", + "Lipid Panel": "Lipid Profile", + "Lipids": "Lipid Profile", + "Blood Lipid": "Lipid Profile", + "Coagulation": "Blood Coagulation", + "Clotting": "Blood Coagulation", + "Infectious Disease": "Four Infectious Diseases", + "Infectious Diseases": "Four Infectious Diseases", + "Infection": "Four Infectious Diseases", + "Electrolytes": "Serum Electrolytes", + "Electrolyte": "Serum Electrolytes", + "Ions": "Serum Electrolytes", + "Liver": "Liver Function", + "Hepatic": "Liver Function", + "Kidney": "Kidney Function", + "Renal": "Kidney Function", + "Renal Function": "Kidney Function", + "Cardiac Enzyme": "Myocardial Enzyme", + "Heart Enzyme": "Myocardial Enzyme", + "Cardiac": "Myocardial Enzyme", + "Myocardial Enzyme Spectrum": "Myocardial Enzyme", + "Thyroid": "Thyroid Function", + "Cardiovascular Risk": "Thromboembolism", + "Cardiovascular": "Thromboembolism", + "Bone": "Bone Metabolism", + "Bone Markers": "Bone Metabolism", + "Trace Elements": "Microelement", + "Trace Element": "Microelement", + "Heavy Metals": "Microelement", + "Lymphocyte": "Lymphocyte Subpopulation", + "Lymphocyto Subpopulation": "Lymphocyte Subpopulation", + "T Cell": "Lymphocyte Subpopulation", + "Immunity": "Humoral Immunity", + "Immunoglobulin": "Humoral Immunity", + "Inflammation": "Inflammatory Reaction", + "Inflammatory": "Inflammatory Reaction", + "Autoimmune": "Autoantibody", + "Autoimmunity": "Autoantibody", + "Female": "Female Hormone", + "Female Hormones": "Female Hormone", + "Male": "Male Hormone", + "Male Hormones": "Male Hormone", + "Hormone": "Male Hormone", + "Tumor": "Tumor Markers", + "Cancer Markers": "Tumor Markers", + "Oncology": "Tumor Markers", + "Radiology": "Imaging", + "Image": "Imaging", + "Gynecology": "Female-specific", + "Gynecological": "Female-specific" + } +} diff --git a/backend/analyze_output.py b/backend/analyze_output.py new file mode 100644 index 0000000..1c5dfaa --- /dev/null +++ b/backend/analyze_output.py @@ -0,0 +1,80 @@ +"""分析生成文件的结构问题""" +from docx import Document +from lxml import etree +import zipfile +import os + +def analyze_file(filepath, name): + """分析文件结构""" + print(f"\n{'='*70}") + print(f"分析: {name}") + print(f"文件: {filepath}") + print(f"{'='*70}") + + # 读取 XML + with zipfile.ZipFile(filepath, 'r') as z: + xml_content = z.read('word/document.xml') + + tree = etree.fromstring(xml_content) + ns = {'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'} + body = tree.find('.//w:body', ns) + + # 找到 Urine Detection 相关的元素 + print("\n搜索 'Urine Detection' 相关元素:") + print("-" * 70) + + urine_positions = [] + for i, elem in enumerate(body): + text = ''.join(elem.itertext()).strip() + if 'Urine' in text and 'Detection' in text: + tag = elem.tag.split('}')[-1] + text_preview = text[:100].replace('\n', '\\n') + print(f" [{i}] <{tag}>: {text_preview}...") + urine_positions.append(i) + + if not urine_positions: + print(" 未找到") + return + + # 分析第一个 Urine Detection 位置前后的元素 + first_pos = urine_positions[0] + print(f"\n从第一个 Urine Detection (位置 {first_pos}) 开始的40个元素:") + print("-" * 70) + + for i in range(first_pos, min(first_pos + 40, len(body))): + elem = body[i] + tag = elem.tag.split('}')[-1] + text = ''.join(elem.itertext()).strip() + text_preview = text[:80].replace('\n', '\\n') if text else '[空]' + + # 额外信息 + extra = "" + if tag == 'tbl': + rows = elem.findall('.//{http://schemas.openxmlformats.org/wordprocessingml/2006/main}tr') + extra = f" [行数:{len(rows)}]" + # 检查是否是表头 + if len(rows) == 1 and ('Abb' in text or 'Project' in text): + extra += " [表头]" + elif tag == 'p': + # 检查是否有分页符 + page_breaks = elem.findall('.//{http://schemas.openxmlformats.org/wordprocessingml/2006/main}br') + for br in page_breaks: + br_type = br.get('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}type') + if br_type == 'page': + extra = " [分页符]" + break + + print(f" [{i}] <{tag}>{extra}: {text_preview}") + +def main(): + # 模板 + template_path = r"../Be.U Wellness Center功能医学健康报告&定制化方案-案例.docx" + + # 最新生成的文件 + generated_path = "reports/filled_report_20260115_204528.docx" + + analyze_file(template_path, "模板") + analyze_file(generated_path, "生成文件") + +if __name__ == "__main__": + main() diff --git a/backend/check_title_format.py b/backend/check_title_format.py new file mode 100644 index 0000000..ea307a2 --- /dev/null +++ b/backend/check_title_format.py @@ -0,0 +1,80 @@ +from docx import Document +from lxml import etree + +doc = Document(r'C:\Users\UI\Desktop\医疗报告\backend\reports\filled_report_20260212_165326.docx') +body = doc.element.body +children = list(body) +ns = {'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'} +w = 'http://schemas.openxmlformats.org/wordprocessingml/2006/main' + +def show_para_format(elem, label): + text = ''.join(elem.itertext()).strip() + print(f'=== {label} ===') + print(f'Text: {text[:80]}') + # pPr + pPr = elem.find('w:pPr', ns) + if pPr is not None: + jc = pPr.find('w:jc', ns) + if jc is not None: + print(f' jc: {jc.get(f"{{{w}}}val")}') + pStyle = pPr.find('w:pStyle', ns) + if pStyle is not None: + print(f' pStyle: {pStyle.get(f"{{{w}}}val")}') + # runs + for r in elem.findall('w:r', ns): + rPr = r.find('w:rPr', ns) + rt = ''.join(r.itertext()).strip() + if not rt: + continue + print(f' Run: "{rt[:50]}"') + if rPr is not None: + rFonts = rPr.find('w:rFonts', ns) + sz = rPr.find('w:sz', ns) + szCs = rPr.find('w:szCs', ns) + b = rPr.find('w:b', ns) + bCs = rPr.find('w:bCs', ns) + color = rPr.find('w:color', ns) + if rFonts is not None: + fonts = {} + for attr in ['ascii', 'hAnsi', 'eastAsia', 'cs']: + v = rFonts.get(f'{{{w}}}{attr}') + if v: + fonts[attr] = v + print(f' fonts: {fonts}') + if sz is not None: + print(f' sz: {sz.get(f"{{{w}}}val")} (={int(sz.get(f"{{{w}}}val"))//2}pt)') + if szCs is not None: + print(f' szCs: {szCs.get(f"{{{w}}}val")}') + if b is not None: + print(f' bold: yes') + if bCs is not None: + print(f' boldCs: yes') + if color is not None: + print(f' color: {color.get(f"{{{w}}}val")}') + else: + print(f' (no rPr)') + +# Overall Health Assessment +for i, elem in enumerate(children): + text = ''.join(elem.itertext()).strip() + if 'Overall Health' in text and 'Assessment' in text and len(text) < 200: + show_para_format(elem, f'Overall Health Assessment [{i}]') + break + +print() + +# Medical Intervention +for i, elem in enumerate(children): + text = ''.join(elem.itertext()).strip() + if 'Medical Intervention' in text and '医学干预' in text and len(text) < 200: + show_para_format(elem, f'Medical Intervention [{i}]') + break + +print() + +# FHA Title +for i, elem in enumerate(children): + text = ''.join(elem.itertext()).strip() + if 'Functional Medical Health Advice' in text and '功能医学健康建议' in text and len(text) < 300: + show_para_format(elem, f'FHA Title [{i}]') + break diff --git a/backend/classify_prompt_reference.md b/backend/classify_prompt_reference.md new file mode 100644 index 0000000..47fe854 --- /dev/null +++ b/backend/classify_prompt_reference.md @@ -0,0 +1,139 @@ +# 医疗检测项目分类提示词参考 + +## 原始 ABB → 模块硬编码映射(旧版,已替换) + +``` +# 尿检 +COLOR, CLARITY, SG, PH, PRO, GLU, KET, NIT, URO, BIL, LEU, ERY, BLD, CRY, BAC → Urine Detection + +# 血常规 +WBC, RBC, HB, HGB, HCT, MCV, MCH, MCHC, PLT, RDW, MPV, PDW, NEUT, LYMPH, MONO, EOS, BAS, ESR → Complete Blood Count + +# 肝功能 +ALT, AST, GGT, ALP, TBIL, DBIL, IBIL, TP, ALB, GLB, A/G, LDH → Liver Function + +# 肾功能 +BUN, CREA, CR, UA, EGFR, CYS-C → Kidney Function + +# 血脂 +TC, TG, HDL, LDL, VLDL, APOA1, APOB, LP(A) → Lipid Panel + +# 电解质 +NA, K, CL, CA, P, MG, FE, ZN, CU, TCO2, AG → Electrolytes + +# 糖代谢 +FPG, HBA1C, OGTT, INS, C-PEP, EAG → Glucose + +# 甲状腺 +TSH, FT3, FT4, T3, T4, TG-AB, TPO-AB → Thyroid + +# 激素(通用,未区分男女) +E2, PROG, TESTO, FSH, LH, PRL, CORTISOL, DHEA-S, IGF-1 → Hormone + +# 肿瘤标志物 +AFP, CEA, CA125, CA153, CA199, PSA, FPSA, NSE, CYFRA21-1, SCC, CA724 → Tumor Markers + +# 凝血 +PT, APTT, TT, FIB, D-DIMER, INR, FDP → Coagulation + +# 传染病 +HBSAG, HBSAB, HBEAG, HBEAB, HBCAB, ANTI-HCV, HIV, RPR, TPPA, H.PYLORI → Infectious Disease + +# 免疫功能(笼统合并,未拆分体液免疫/炎症/自身抗体) +IGG, IGA, IGM, IGE, C3, C4, CRP, HS-CRP, RF, ANA, ANTI-SM, ANTI-RNP, ASO, NK → Immune Function + +# 骨代谢 +OSTE, P1NP, CTX, PTH, 25-OH-VITD → Bone Metabolism + +# 重金属 +PB, MN, NI, CR, CD, HG → Heavy Metals + +# 维生素 +VITB12, FOLATE, VITD → Vitamin + +# 同型半胱氨酸 +HCY → Homocysteine + +# 血型 +ABO, RH → Blood Type +``` + +## 原始项目名关键词 → 模块映射(旧版,已替换) + +``` +urine, urinary → Urine Detection +blood cell, hemoglobin, platelet, neutrophil → Complete Blood Count +liver, hepat, bilirubin → Liver Function +kidney, renal, creatinine → Kidney Function +cholesterol, triglyceride, lipid → Lipid Panel +glucose, sugar, hba1c, insulin → Glucose +thyroid, tsh → Thyroid +estrogen, testosterone, progesterone, cortisol, hormone → Hormone +tumor, cancer, antigen → Tumor Markers +coagul, thrombin, fibrin → Coagulation +hepatitis, hiv, syphilis → Infectious Disease +immun, antibod, complement → Immune Function +bone, osteocalcin → Bone Metabolism +metal, lead, mercury → Heavy Metals +vitamin, folate, b12 → Vitamin +homocysteine → Homocysteine +``` + +## 原始 DeepSeek 提示词(旧版,已替换) + +``` +请判断以下医学检测项目属于哪个检测模块,只返回模块名称(英文): + +项目缩写: {abb} +项目名称: {project_name} + +可选模块: +- Urine Detection(尿液检测) +- Complete Blood Count(血常规) +- Liver Function(肝功能) +- Kidney Function(肾功能) +- Lipid Panel(血脂) +- Electrolytes(电解质) +- Glucose(糖代谢) +- Thyroid(甲状腺功能) +- Hormone(激素) +- Tumor Markers(肿瘤标志物) +- Coagulation(凝血功能) +- Infectious Disease(传染病) +- Immune Function(免疫功能) +- Bone Metabolism(骨代谢) +- Heavy Metals(重金属) +- Vitamin(维生素) +- Other(其他) + +只返回英文模块名称,不要其他内容。 +``` + +## 旧版 → 新版模块名对照 + +| 旧模块名 | 新模块名(对齐 abb_mapping_config.json) | +|---|---| +| Urine Detection | Urine Test | +| Lipid Panel | Lipid Profile | +| Electrolytes | Serum Electrolytes | +| Glucose | Blood Sugar | +| Thyroid | Thyroid Function | +| Hormone(通用) | Female Hormone / Male Hormone(按性别拆分) | +| Coagulation | Blood Coagulation | +| Infectious Disease | Four Infectious Diseases | +| Immune Function(笼统) | Humoral Immunity / Inflammatory Reaction / Autoantibody(拆分为3个) | +| Heavy Metals | Microelement | +| Vitamin | 移除(归入 Bone Metabolism 或 Other) | +| Homocysteine | Thromboembolism | +| —(缺失) | Myocardial Enzyme(新增) | +| —(缺失) | Lymphocyte Subpopulation(新增) | +| —(缺失) | Imaging(新增) | +| —(缺失) | Female-specific(新增) | + +## 新版分类问题修正 + +1. **ESR**:旧版归入 Complete Blood Count → 新版归入 Inflammatory Reaction(参考 NH Excel) +2. **LDH**:旧版归入 Liver Function → 新版归入 Myocardial Enzyme +3. **D-Dimer**:旧版归入 Coagulation → 新版归入 Thromboembolism +4. **FE/ZN/CU**:旧版归入 Electrolytes → 新版归入 Microelement +5. **免疫系统**:旧版笼统 Immune Function → 新版拆分为 Humoral Immunity(IgG/IgA/IgM/IgE/C3/C4)、Inflammatory Reaction(CRP/ESR/ASO)、Autoantibody(ANA/RF) diff --git a/backend/compare_format.py b/backend/compare_format.py new file mode 100644 index 0000000..6f34319 --- /dev/null +++ b/backend/compare_format.py @@ -0,0 +1,84 @@ +"""对比模板和生成文件的格式差异""" +from docx import Document +from docx.shared import Pt, Inches +import os + +def analyze_document(filepath, name): + """分析文档结构""" + print(f"\n{'='*60}") + print(f"分析: {name}") + print(f"文件: {filepath}") + print(f"{'='*60}") + + doc = Document(filepath) + + # 找到 Urine Detection 模块 + found_urine = False + urine_start = -1 + + for i, elem in enumerate(doc.element.body): + text = elem.text if hasattr(elem, 'text') and elem.text else '' + if 'Urine' in text and 'Detection' in text: + urine_start = i + found_urine = True + break + + if not found_urine: + print("未找到 Urine Detection 模块") + return + + print(f"\n找到 Urine Detection 位置: {urine_start}") + print(f"\n从 Urine Detection 开始的前30个元素:") + print("-" * 60) + + for i in range(urine_start, min(urine_start + 30, len(doc.element.body))): + elem = doc.element.body[i] + tag = elem.tag.split('}')[-1] + text = elem.text if hasattr(elem, 'text') else '' + text_preview = text[:80].replace('\n', '\\n') if text else '' + + # 获取更多信息 + extra_info = "" + if tag == 'p': + # 检查段落样式 + p_elem = elem + style_elem = p_elem.find('.//{http://schemas.openxmlformats.org/wordprocessingml/2006/main}pStyle') + if style_elem is not None: + extra_info = f" [style: {style_elem.get('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}val')}]" + + # 检查是否有图片 + drawings = p_elem.findall('.//{http://schemas.openxmlformats.org/wordprocessingml/2006/main}drawing') + if drawings: + extra_info += f" [有图片: {len(drawings)}个]" + + elif tag == 'tbl': + # 统计表格行数和列数 + rows = elem.findall('.//{http://schemas.openxmlformats.org/wordprocessingml/2006/main}tr') + extra_info = f" [行数: {len(rows)}]" + + print(f" [{i}] <{tag}>{extra_info}: {text_preview}") + +def main(): + # 模板文件 + template_path = r"../Be.U Wellness Center功能医学健康报告&定制化方案-案例.docx" + + # 生成的文件 - 找最新的 + reports_dir = "reports" + if os.path.exists(reports_dir): + files = [f for f in os.listdir(reports_dir) if f.startswith('filled_report_') and f.endswith('.docx')] + if files: + files.sort(reverse=True) + generated_path = os.path.join(reports_dir, files[0]) + else: + print("未找到生成的报告文件") + return + else: + print("reports目录不存在") + return + + # 分析两个文档 + analyze_document(template_path, "模板文件") + analyze_document(generated_path, "生成文件") + +if __name__ == "__main__": + main() diff --git a/backend/config.py b/backend/config.py new file mode 100644 index 0000000..86c2533 --- /dev/null +++ b/backend/config.py @@ -0,0 +1,412 @@ +""" +配置文件 - 统一管理路径和参数 +""" +from pathlib import Path +import os + +# 项目根目录 +PROJECT_ROOT = Path(__file__).parent.parent +BACKEND_ROOT = Path(__file__).parent + +# ==================== 路径配置 ==================== + +# PDF输入目录(存放原始医疗报告PDF) +PDF_INPUT_DIR = Path(r"c:\Users\UI\Desktop\医疗报告\医疗报告智能体") + +# Word模板文件 +TEMPLATE_COMPLETE = BACKEND_ROOT / "template_complete.docx" # 用于 extract_and_fill_report.py +TEMPLATE_DOCXTPL = PROJECT_ROOT / "template_docxtpl.docx" # 用于 fill_with_docxtpl.py + +# 配置文件 +ABB_MAPPING_CONFIG = BACKEND_ROOT / "abb_mapping_config.json" + +# 输出目录 +REPORTS_OUTPUT_DIR = BACKEND_ROOT / "reports" +REPORTS_OUTPUT_DIR.mkdir(exist_ok=True) + +# 缓存文件 +EXTRACTED_DATA_FILE = BACKEND_ROOT / "extracted_medical_data.json" +ANALYZED_DATA_FILE = BACKEND_ROOT / "analyzed_medical_data.json" +DEEPSEEK_PROCESSED_DATA_FILE = BACKEND_ROOT / "deepseek_processed_data.json" +DEEPSEEK_CACHE_FILE = BACKEND_ROOT / "deepseek_cache.json" + +# ==================== OCR配置 ==================== + +# 百度OCR配置(从环境变量读取) +BAIDU_OCR_APP_ID = os.getenv("BAIDU_OCR_APP_ID", "") +BAIDU_OCR_API_KEY = os.getenv("BAIDU_OCR_API_KEY", "") +BAIDU_OCR_SECRET_KEY = os.getenv("BAIDU_OCR_SECRET_KEY", "") + +# ==================== LLM配置 ==================== + +# DeepSeek配置 +DEEPSEEK_API_KEY = os.getenv("DEEPSEEK_API_KEY", "") +DEEPSEEK_API_BASE = os.getenv("DEEPSEEK_API_BASE", "https://api.deepseek.com") + +# OpenAI配置 +OPENAI_API_KEY = os.getenv("OPENAI_API_KEY", "") +OPENAI_API_BASE = os.getenv("OPENAI_API_BASE", "") +OPENAI_MODEL = os.getenv("OPENAI_MODEL", "gpt-3.5-turbo") + +# Ollama配置 +OLLAMA_HOST = os.getenv("OLLAMA_HOST", "http://localhost:11434") +OLLAMA_MODEL = os.getenv("OLLAMA_MODEL", "qwen2.5:7b") + +# Coze配置 +COZE_API_KEY = os.getenv("COZE_API_KEY", "") +COZE_WORKFLOW_ID = os.getenv("COZE_WORKFLOW_ID", "") + +# ==================== 功能开关 ==================== + +# 是否启用DeepSeek分析(在extract_and_fill_report.py中使用) +ENABLE_DEEPSEEK_ANALYSIS = os.getenv("ENABLE_DEEPSEEK_ANALYSIS", "false").lower() == "true" + +# ==================== 辅助函数 ==================== + +def load_abb_config() -> dict: + """ + 加载ABB映射配置文件 + + Returns: + dict: 包含以下键的字典: + - modules: 模块配置字典 + - abb_list: 所有ABB列表 + - abb_to_module: ABB到模块的映射 + - abb_to_info: ABB到详细信息的映射 + - abb_aliases: ABB别名映射 + - module_aliases: 模块名称别名映射 + """ + import json + + result = { + 'modules': {}, + 'abb_list': [], + 'abb_to_module': {}, + 'abb_to_info': {}, + 'abb_aliases': {}, + 'module_aliases': {} + } + + if not ABB_MAPPING_CONFIG.exists(): + return result + + with open(ABB_MAPPING_CONFIG, 'r', encoding='utf-8') as f: + config = json.load(f) + + # 新格式:基于模块的配置 + if 'modules' in config and isinstance(config['modules'], dict): + result['modules'] = config['modules'] + result['abb_aliases'] = config.get('abb_aliases', {}) + result['module_aliases'] = config.get('module_aliases', {}) + + # 定义大小写敏感的ABB(这些ABB有大小写冲突,必须精确匹配) + case_sensitive_abbs = {'TG', 'Tg'} # TG=甘油三酯, Tg=甲状腺球蛋白 + + for module_name, module_data in config['modules'].items(): + items = module_data.get('items', []) + for item in items: + abb = item.get('abb', '') + if abb: + result['abb_list'].append(abb) + info = { + 'abb': abb, + 'project': item.get('project', ''), + 'project_cn': item.get('project_cn', ''), + 'module': module_name, + 'module_cn': module_data.get('cn_name', '') + } + # 对于大小写敏感的ABB,使用原始大小写作为key + if abb in case_sensitive_abbs: + result['abb_to_module'][abb] = module_name + result['abb_to_info'][abb] = info + else: + # 其他ABB使用大写作为key(保持向后兼容) + result['abb_to_module'][abb.upper()] = module_name + result['abb_to_info'][abb.upper()] = info + + # 旧格式:items列表 + elif 'items' in config: + for item in config['items']: + abb = item.get('abb', '') + if abb: + result['abb_list'].append(abb) + module = item.get('module', '') + result['abb_to_module'][abb.upper()] = module + result['abb_to_info'][abb.upper()] = { + 'abb': abb, + 'project': item.get('project', ''), + 'project_cn': item.get('project_cn', ''), + 'module': module + } + + return result + + +def normalize_abb(abb: str, config: dict = None) -> str: + """ + 标准化ABB名称(处理别名) + + Args: + abb: 原始ABB名称 + config: ABB配置(可选,如果不提供则自动加载) + + Returns: + 标准化后的ABB名称 + """ + if config is None: + config = load_abb_config() + + aliases = config.get('abb_aliases', {}) + abb_upper = abb.upper() + + # 检查是否有别名 + if abb in aliases: + return aliases[abb] + if abb_upper in aliases: + return aliases[abb_upper] + + return abb + + +def normalize_module_name(module: str, config: dict = None) -> str: + """ + 标准化模块名称(处理DeepSeek返回的不同名称) + + Args: + module: 原始模块名称 + config: ABB配置(可选,如果不提供则自动加载) + + Returns: + 标准化后的模块名称 + """ + if config is None: + config = load_abb_config() + + module_aliases = config.get('module_aliases', {}) + + # 检查是否有别名 + if module in module_aliases: + return module_aliases[module] + + # 尝试不区分大小写匹配 + module_lower = module.lower() + for alias, standard in module_aliases.items(): + if alias.lower() == module_lower: + return standard + + return module + + +def get_standard_module_order() -> list: + """ + 获取标准模块顺序(基于2.pdf模板) + 优先从配置文件读取order字段,确保与2.pdf一致 + + Returns: + 模块名称列表,按标准顺序排列 + """ + config = load_abb_config() + modules = config.get('modules', {}) + + # 如果配置中有order字段,按order排序 + if modules and any('order' in m for m in modules.values()): + sorted_modules = sorted( + modules.items(), + key=lambda x: x[1].get('order', 999) + ) + return [name for name, _ in sorted_modules] + + # 默认顺序(与2.pdf一致) + return [ + 'Urine Test', # 1. 尿液检测 (第16-19页) + 'Complete Blood Count', # 2. 血常规 (第20-26页) + 'Blood Sugar', # 3. 血糖 (第27-28页) + 'Lipid Profile', # 4. 血脂 (第29-31页) + 'Blood Type', # 5. 血型 (第32-33页) + 'Blood Coagulation', # 6. 凝血功能 (第34-36页) + 'Four Infectious Diseases', # 7. 传染病四项 (第37-40页) + 'Serum Electrolytes', # 8. 血电解质 (第41-43页) + 'Liver Function', # 9. 肝功能 (第44-47页) + 'Kidney Function', # 10. 肾功能 (第48-49页) + 'Myocardial Enzyme', # 11. 心肌酶谱 (第50-51页) + 'Thyroid Function', # 12. 甲状腺功能 (第52-54页) + 'Thromboembolism', # 13. 心脑血管风险因子 (第55-56页) + 'Bone Metabolism', # 14. 骨代谢 (第57-59页) + 'Microelement', # 15. 微量元素 (第60-62页) + 'Lymphocyte Subpopulation', # 16. 淋巴细胞亚群 (第63-64页) + 'Humoral Immunity', # 17. 体液免疫 (第65-67页) + 'Inflammatory Reaction', # 18. 炎症反应 (第68-69页) + 'Autoantibody', # 19. 自身抗体 (第70-71页) + 'Female Hormone', # 20. 女性荷尔蒙 (第72-75页) + 'Male Hormone', # 21. 男性荷尔蒙 (第76-79页) + 'Tumor Markers', # 22. 肿瘤标记物 (第80-84页) + 'Imaging', # 23. 影像学检查 (第85-88页) + 'Female-specific', # 24. 女性专项检查 (第89-91页) + ] + + +def get_standard_item_order(module_name: str, config: dict = None) -> list: + """ + 获取指定模块的标准项目顺序 + + Args: + module_name: 模块名称 + config: ABB配置(可选) + + Returns: + 该模块的ABB列表,按标准顺序排列 + """ + if config is None: + config = load_abb_config() + + modules = config.get('modules', {}) + if module_name in modules: + items = modules[module_name].get('items', []) + return [item.get('abb', '') for item in items] + + return [] + + +def sort_items_by_standard_order(items: list, module_name: str, config: dict = None) -> list: + """ + 按标准顺序排序项目列表 + + Args: + items: [(abb, data), ...] 格式的项目列表 + module_name: 模块名称 + config: ABB配置(可选) + + Returns: + 排序后的项目列表,标准项目在前,非标准项目在后 + """ + if config is None: + config = load_abb_config() + + standard_order = get_standard_item_order(module_name, config) + abb_aliases = config.get('abb_aliases', {}) + + # 创建顺序映射(先精确匹配,再大写匹配) + # 大小写敏感的ABB(如TG/Tg)需要精确匹配 + case_sensitive_abbs = {'TG', 'Tg'} + order_map_exact = {abb: i for i, abb in enumerate(standard_order)} + order_map_upper = {abb.upper(): i for i, abb in enumerate(standard_order) if abb not in case_sensitive_abbs} + + # 分离标准项目和非标准项目 + standard_items = [] + extra_items = [] + + for abb, data in items: + # 先标准化ABB(处理别名) + normalized_abb = normalize_abb(abb, config) + + # 先尝试精确匹配(使用标准化后的ABB) + if normalized_abb in order_map_exact: + standard_items.append((abb, data, order_map_exact[normalized_abb])) + # 再尝试原始ABB精确匹配 + elif abb in order_map_exact: + standard_items.append((abb, data, order_map_exact[abb])) + # 再尝试大写匹配(排除大小写敏感的ABB) + elif normalized_abb.upper() in order_map_upper: + standard_items.append((abb, data, order_map_upper[normalized_abb.upper()])) + elif abb.upper() in order_map_upper: + standard_items.append((abb, data, order_map_upper[abb.upper()])) + else: + extra_items.append((abb, data)) + + # 标准项目按顺序排序 + standard_items.sort(key=lambda x: x[2]) + sorted_standard = [(abb, data) for abb, data, _ in standard_items] + + # 非标准项目按ABB字母顺序排序,添加到末尾 + extra_items.sort(key=lambda x: x[0].upper()) + + return sorted_standard + extra_items + + +def get_output_path(prefix: str = "filled_report", suffix: str = ".docx") -> Path: + """ + 生成输出文件路径(自动递增版本号) + + Args: + prefix: 文件名前缀 + suffix: 文件后缀 + + Returns: + 输出文件路径 + """ + existing = list(REPORTS_OUTPUT_DIR.glob(f"{prefix}_*.docx")) + if not existing: + version = 1 + else: + versions = [] + for p in existing: + name = p.stem + try: + # 尝试提取版本号(格式: prefix_v1, prefix_20240101_120000等) + if name.startswith(prefix): + rest = name[len(prefix):] + if rest.startswith('_v'): + v_str = rest[2:] + if v_str.isdigit(): + versions.append(int(v_str)) + elif rest.startswith('_') and len(rest) > 1: + # 尝试提取时间戳后的版本号 + parts = rest.split('_') + if len(parts) > 1: + last_part = parts[-1] + if last_part.isdigit(): + versions.append(int(last_part)) + except: + continue + version = max(versions) + 1 if versions else 1 + + return REPORTS_OUTPUT_DIR / f"{prefix}_v{version}{suffix}" + + +def check_required_files() -> dict: + """ + 检查必需文件是否存在 + + Returns: + dict: {文件路径: 是否存在} + """ + return { + "template_complete": TEMPLATE_COMPLETE.exists(), + "template_docxtpl": TEMPLATE_DOCXTPL.exists(), + "config_file": ABB_MAPPING_CONFIG.exists(), + "pdf_input_dir": PDF_INPUT_DIR.exists(), + } + + +def print_config_summary(): + """打印配置摘要""" + print("\n" + "=" * 70) + print("配置摘要") + print("=" * 70) + + print("\n[路径配置]") + print(f" PDF输入目录: {PDF_INPUT_DIR}") + print(f" 模板文件 (extract): {TEMPLATE_COMPLETE}") + print(f" 模板文件 (docxtpl): {TEMPLATE_DOCXTPL}") + print(f" 配置文件: {ABB_MAPPING_CONFIG}") + print(f" 输出目录: {REPORTS_OUTPUT_DIR}") + + print("\n[文件检查]") + files_status = check_required_files() + for name, exists in files_status.items(): + status = "[OK] 存在" if exists else "[X] 缺失" + print(f" {name}: {status}") + + print("\n[API配置]") + print(f" 百度OCR: {'[OK] 已配置' if BAIDU_OCR_API_KEY else '[X] 未配置'}") + print(f" DeepSeek: {'[OK] 已配置' if DEEPSEEK_API_KEY else '[X] 未配置'}") + print(f" OpenAI: {'[OK] 已配置' if OPENAI_API_KEY else '[X] 未配置'}") + print(f" Ollama: {'[OK] 已配置' if OLLAMA_HOST else '[X] 未配置'}") + print(f" Coze: {'[OK] 已配置' if COZE_API_KEY else '[X] 未配置'}") + + print("=" * 70 + "\n") + + +if __name__ == '__main__': + # 测试配置 + print_config_summary() diff --git a/backend/deepseek_analyzer.py b/backend/deepseek_analyzer.py new file mode 100644 index 0000000..53f08e9 --- /dev/null +++ b/backend/deepseek_analyzer.py @@ -0,0 +1,283 @@ +""" +DeepSeek医疗数据分析器 +使用DeepSeek API分析OCR提取的医疗数据,补充缺失的参考范围和单位 +""" + +import json +import requests +from typing import List, Dict + + +class DeepSeekAnalyzer: + def __init__(self, api_key: str): + self.api_key = api_key + self.api_url = "https://api.deepseek.com/v1/chat/completions" + self.headers = { + "Authorization": f"Bearer {api_key}", + "Content-Type": "application/json" + } + + def analyze_medical_data(self, items: List[Dict]) -> List[Dict]: + """ + 分析医疗数据,补充缺失的参考范围、单位和提示 + + Args: + items: OCR提取的医疗检测项列表 + + Returns: + 补充完整的医疗检测项列表 + """ + # 分批处理,每批20个项目 + batch_size = 20 + all_results = [] + + for i in range(0, len(items), batch_size): + batch = items[i:i+batch_size] + print(f" 处理第 {i//batch_size + 1} 批 ({len(batch)} 项)...") + + result = self._analyze_batch(batch) + if result: + all_results.extend(result) + else: + # 如果API调用失败,保留原始数据 + all_results.extend(batch) + + return all_results + + def _analyze_batch(self, items: List[Dict]) -> List[Dict]: + """分析一批医疗数据""" + + # 构建提示词 + prompt = self._build_prompt(items) + + try: + response = requests.post( + self.api_url, + headers=self.headers, + json={ + "model": "deepseek-chat", + "messages": [ + { + "role": "system", + "content": """你是一个专业的医学检验数据分析专家。你的任务是: +1. 分析医疗检测项目数据 +2. 为缺失参考范围(reference)的项目补充标准参考范围 +3. 为缺失单位(unit)的项目补充正确单位 +4. 判断结果是否在正常范围内: + - 如果结果在正常范围内,point字段设为空字符串"" + - 如果结果高于正常范围,point字段设为"↑" + - 如果结果低于正常范围,point字段设为"↓" + - 如果是定性结果(如Negative/Positive),且结果正常,point为空;异常则标注 + +请严格按照JSON格式返回,不要添加任何额外说明。""" + }, + { + "role": "user", + "content": prompt + } + ], + "temperature": 0.1, + "max_tokens": 4000 + }, + timeout=60 + ) + + if response.status_code == 200: + result = response.json() + content = result['choices'][0]['message']['content'] + + # 解析JSON响应 + # 处理可能的markdown代码块 + if '```json' in content: + content = content.split('```json')[1].split('```')[0] + elif '```' in content: + content = content.split('```')[1].split('```')[0] + + return json.loads(content.strip()) + else: + print(f" ⚠ API错误: {response.status_code} - {response.text[:100]}") + return None + + except json.JSONDecodeError as e: + print(f" ⚠ JSON解析错误: {e}") + return None + except requests.exceptions.Timeout: + print(" ⚠ API请求超时") + return None + except Exception as e: + print(f" ⚠ 请求错误: {e}") + return None + + def _build_prompt(self, items: List[Dict]) -> str: + """构建分析提示词""" + + # 简化数据,只保留必要字段 + simplified = [] + for item in items: + simplified.append({ + "abb": item.get("abb", ""), + "project": item.get("project", ""), + "result": item.get("result", ""), + "point": item.get("point", ""), + "unit": item.get("unit", ""), + "reference": item.get("reference", "") + }) + + prompt = f"""请分析以下医疗检测数据,补充缺失的参考范围和单位,并判断结果是否正常: + +{json.dumps(simplified, ensure_ascii=False, indent=2)} + +要求: +1. 为每个项目补充完整的reference(参考范围)和unit(单位) +2. 根据result判断是否在正常范围内,设置point字段(正常为空,偏高为"↑",偏低为"↓") +3. 定性结果(如Negative、Positive):正常时point为空,异常时根据具体情况标注 +4. 保持原有的abb、project、result字段不变 + +请直接返回JSON数组格式,不要添加任何说明文字:""" + + return prompt + + +def test_deepseek(): + """测试DeepSeek API""" + # 需要替换为实际的API Key + api_key = "YOUR_DEEPSEEK_API_KEY" + + analyzer = DeepSeekAnalyzer(api_key) + + # 测试数据 + test_items = [ + {"abb": "WBC", "project": "White Blood Cell", "result": "5.95", "point": "", "unit": "", "reference": ""}, + {"abb": "PRO", "project": "Protein", "result": "Negative", "point": "", "unit": "", "reference": ""}, + {"abb": "GLU", "project": "Glucose", "result": "6.5", "point": "", "unit": "", "reference": ""}, + ] + + result = analyzer.analyze_medical_data(test_items) + print(json.dumps(result, ensure_ascii=False, indent=2)) + + +def call_deepseek(prompt: str, api_key: str) -> str: + """调用DeepSeek API(通用接口)""" + url = "https://api.deepseek.com/v1/chat/completions" + headers = { + "Authorization": f"Bearer {api_key}", + "Content-Type": "application/json" + } + data = { + "model": "deepseek-chat", + "messages": [ + {"role": "user", "content": prompt} + ], + "temperature": 0.1, + "max_tokens": 8000 + } + + response = requests.post(url, headers=headers, json=data, timeout=120) + response.raise_for_status() + return response.json()["choices"][0]["message"]["content"] + + +def process_with_deepseek(ocr_data: list, template_abbs: list, api_key: str) -> dict: + """让DeepSeek处理OCR数据并匹配到模板ABB""" + + prompt = f"""你是医疗数据处理专家。请处理以下OCR提取的医疗检测数据,并匹配到模板中的ABB。 + +## OCR提取的原始数据: +```json +{json.dumps(ocr_data, ensure_ascii=False, indent=2)} +``` + +## 模板中需要填充的ABB列表: +{template_abbs} + +## 任务要求: +1. 清理OCR数据中的错误和噪音 +2. 将每个有效数据项匹配到正确的模板ABB +3. 正确分离result(结果)、unit(单位)、reference(参考范围) +4. 对于尿检项目(如PRO、GLU、KET、NIT等),结果通常是Negative/Positive,这是正确的定性结果 +5. 过滤掉明显错误的数据(如result为"."、"0"、空值等) +6. 如果同一个ABB有多条数据,选择最合理的一条 + +## 输出格式: +请返回JSON格式,结构如下: +```json +{{ + "ABB1": {{"result": "数值或定性结果", "unit": "单位", "reference": "参考范围", "point": "提示"}}, + "ABB2": {{"result": "...", "unit": "...", "reference": "...", "point": ""}}, + ... +}} +``` + +只返回JSON,不要其他说明文字。确保JSON格式正确可解析。 +""" + + print("正在调用DeepSeek处理数据...") + result = call_deepseek(prompt, api_key) + + # 提取JSON + try: + # 尝试直接解析 + return json.loads(result) + except: + # 尝试从markdown代码块提取 + import re + json_match = re.search(r'```(?:json)?\s*([\s\S]*?)\s*```', result) + if json_match: + return json.loads(json_match.group(1)) + raise ValueError(f"无法解析DeepSeek返回的JSON: {result[:500]}") + + +def process_ocr_data_main(): + """ + 处理OCR数据的主函数(原deepseek_process.py的main函数) + """ + from pathlib import Path + import os + + # 从环境变量获取API Key + api_key = os.environ.get("DEEPSEEK_API_KEY", "") + + if not api_key: + api_key = input("请输入DeepSeek API Key: ").strip() + + if not api_key: + print("❌ API Key不能为空") + return + + # 加载OCR数据 + ocr_file = Path(__file__).parent / "extracted_medical_data.json" + if not ocr_file.exists(): + print("❌ 未找到OCR数据文件") + return + + with open(ocr_file, 'r', encoding='utf-8') as f: + data = json.load(f) + ocr_items = data.get('items', data) if isinstance(data, dict) else data + print(f"加载 {len(ocr_items)} 条OCR数据") + + # 加载模板ABB配置 + from config import load_abb_config + config = load_abb_config() + template_abbs = config.get('abb_list', []) + print(f"模板中有 {len(template_abbs)} 个ABB") + + # 调用DeepSeek处理 + processed_data = process_with_deepseek(ocr_items, template_abbs, api_key) + + # 保存处理后的数据 + output_file = Path(__file__).parent / "deepseek_processed_data.json" + with open(output_file, 'w', encoding='utf-8') as f: + json.dump(processed_data, f, ensure_ascii=False, indent=2) + + print(f"✅ DeepSeek处理完成,共 {len(processed_data)} 个有效项") + print(f"✅ 已保存到: {output_file}") + + return processed_data + + +if __name__ == "__main__": + import sys + if len(sys.argv) > 1 and sys.argv[1] == '--process': + process_ocr_data_main() + else: + test_deepseek() diff --git a/backend/extra_items_handler.py b/backend/extra_items_handler.py new file mode 100644 index 0000000..4327f09 --- /dev/null +++ b/backend/extra_items_handler.py @@ -0,0 +1,694 @@ +""" +处理PDF中有但模板中没有的检测项目 +- 识别额外项目 +- 调用DeepSeek进行分类 +- 在对应模块末尾插入表格 +""" + +import json +import requests +from typing import Dict, List, Tuple +from pathlib import Path +from docx import Document +from docx.shared import Pt, Cm +from docx.enum.text import WD_ALIGN_PARAGRAPH +from docx.enum.table import WD_TABLE_ALIGNMENT +from docx.oxml.ns import qn +from docx.oxml import OxmlElement +from copy import deepcopy + + +def clean_reference_range(reference: str) -> str: + """清理参考范围格式:去掉括号,将 List[Dict]: + """ + 识别模板中没有的额外项目 + + Args: + extracted_items: OCR提取的所有项目 + + Returns: + 额外项目列表 + """ + extra_items = [] + + for item in extracted_items: + abb = item.get('abb', '').upper() + + # 跳过空ABB + if not abb: + continue + + # 检查是否在已知ABB中 + if abb not in self.known_abbs: + extra_items.append(item) + + print(f" 识别到 {len(extra_items)} 个额外项目(模板中没有)") + return extra_items + + def classify_items_with_deepseek(self, extra_items: List[Dict]) -> Dict[str, List[Dict]]: + """ + 使用DeepSeek对额外项目进行分类 + + Args: + extra_items: 额外项目列表 + + Returns: + {模块名: [项目列表]} + """ + if not extra_items: + return {} + + if not self.api_key: + print(" ⚠️ 未配置DeepSeek API Key,使用默认分类") + return self._default_classify(extra_items) + + # 构建项目描述 + items_desc = [] + for item in extra_items: + desc = f"- ABB: {item.get('abb', '')}, 项目名: {item.get('project', '')}" + if item.get('result'): + desc += f", 结果: {item.get('result', '')}" + if item.get('unit'): + desc += f" {item.get('unit', '')}" + items_desc.append(desc) + + # 获取可用模块列表 + modules = list(self.abb_config.get('modules', {}).keys()) + + prompt = f"""你是医学检验专家,请将以下检测项目分类到对应的检测模块中。 + +## 待分类的检测项目: +{chr(10).join(items_desc)} + +## 可用的检测模块: +{', '.join(modules)} + +## 要求: +1. 根据项目的医学属性,将每个项目分配到最合适的模块 +2. 如果项目不属于任何已有模块,分配到 "Other Tests" +3. 返回JSON格式 + +## 输出格式: +```json +{{ + "模块名1": ["ABB1", "ABB2"], + "模块名2": ["ABB3"], + ... +}} +``` + +只返回JSON,不要其他说明。""" + + try: + headers = { + "Authorization": f"Bearer {self.api_key}", + "Content-Type": "application/json" + } + + response = requests.post( + self.api_url, + headers=headers, + json={ + "model": "deepseek-chat", + "messages": [{"role": "user", "content": prompt}], + "temperature": 0.1, + "max_tokens": 2000 + }, + timeout=60 + ) + + if response.status_code == 200: + content = response.json()['choices'][0]['message']['content'] + + # 解析JSON + if '```json' in content: + content = content.split('```json')[1].split('```')[0] + elif '```' in content: + content = content.split('```')[1].split('```')[0] + + classification = json.loads(content.strip()) + + # 将ABB映射回完整项目数据 + result = {} + abb_to_item = {item['abb'].upper(): item for item in extra_items} + + for module, abbs in classification.items(): + result[module] = [] + for abb in abbs: + abb_upper = abb.upper() + if abb_upper in abb_to_item: + result[module].append(abb_to_item[abb_upper]) + + print(f" ✓ DeepSeek分类完成: {len(result)} 个模块") + return result + else: + print(f" ⚠️ DeepSeek API错误: {response.status_code}") + return self._default_classify(extra_items) + + except Exception as e: + print(f" ⚠️ DeepSeek分类失败: {e}") + return self._default_classify(extra_items) + + def _default_classify(self, extra_items: List[Dict]) -> Dict[str, List[Dict]]: + """默认分类逻辑(当DeepSeek不可用时)""" + # 简单的关键词匹配分类 + result = {'Other Tests': []} + + keyword_to_module = { + 'crp': 'Inflammatory Reaction', + 'esr': 'Inflammatory Reaction', + 'hs-crp': 'Inflammatory Reaction', + 'tgab': 'Thyroid Function', + 'tpoab': 'Thyroid Function', + 'ery': 'Urine Test', + 'cib': 'Microelement', + 'mib': 'Microelement', + } + + for item in extra_items: + abb_lower = item.get('abb', '').lower() + project_lower = item.get('project', '').lower() + + classified = False + for keyword, module in keyword_to_module.items(): + if keyword in abb_lower or keyword in project_lower: + if module not in result: + result[module] = [] + result[module].append(item) + classified = True + break + + if not classified: + result['Other Tests'].append(item) + + # 移除空模块 + result = {k: v for k, v in result.items() if v} + + return result + + def generate_clinical_significance(self, items: List[Dict]) -> Dict[str, Dict[str, str]]: + """ + 为额外项目生成临床意义解释 + + Args: + items: 项目列表 + + Returns: + {ABB: {"clinical_en": "...", "clinical_cn": "..."}} + """ + if not items or not self.api_key: + return {} + + items_desc = [] + for item in items: + desc = f"- {item.get('abb', '')}: {item.get('project', '')}" + if item.get('result'): + desc += f", 结果: {item.get('result', '')}" + items_desc.append(desc) + + prompt = f"""你是医学检验专家,请为以下检测项目生成简短的临床意义解释。 + +## 检测项目: +{chr(10).join(items_desc)} + +## 要求: +1. 每个项目提供英文和中文解释 +2. 解释简洁,约30-50字 +3. 说明该指标的临床意义 + +## 输出格式(JSON): +```json +{{ + "ABB1": {{ + "clinical_en": "English explanation...", + "clinical_cn": "中文解释..." + }} +}} +``` + +只返回JSON。""" + + try: + headers = { + "Authorization": f"Bearer {self.api_key}", + "Content-Type": "application/json" + } + + response = requests.post( + self.api_url, + headers=headers, + json={ + "model": "deepseek-chat", + "messages": [{"role": "user", "content": prompt}], + "temperature": 0.1, + "max_tokens": 4000 + }, + timeout=60 + ) + + if response.status_code == 200: + content = response.json()['choices'][0]['message']['content'] + + if '```json' in content: + content = content.split('```json')[1].split('```')[0] + elif '```' in content: + content = content.split('```')[1].split('```')[0] + + return json.loads(content.strip()) + + except Exception as e: + print(f" ⚠️ 生成临床意义失败: {e}") + + return {} + + + def find_module_position(self, doc: Document, module_name: str) -> int: + """ + 在文档中找到指定模块的最后一个表格位置 + + Args: + doc: Word文档对象 + module_name: 模块名称 + + Returns: + 模块最后一个表格在body中的索引,-1表示未找到 + """ + keywords = self.module_keywords.get(module_name, [module_name.lower()]) + + body = doc._body._body + children = list(body) + + module_start_idx = -1 + module_end_idx = -1 + + # 找到模块开始位置 + for i, elem in enumerate(children): + text = ''.join(elem.itertext()).strip().lower() + + for kw in keywords: + if kw in text: + module_start_idx = i + break + + if module_start_idx >= 0: + break + + if module_start_idx < 0: + return -1 + + # 找到模块结束位置(下一个模块开始或文档结束) + all_module_keywords = [] + for kws in self.module_keywords.values(): + all_module_keywords.extend(kws) + + for i in range(module_start_idx + 1, len(children)): + text = ''.join(children[i].itertext()).strip().lower() + + # 检查是否是另一个模块的开始 + for kw in all_module_keywords: + if kw in text and kw not in keywords: + module_end_idx = i + break + + if module_end_idx >= 0: + break + + if module_end_idx < 0: + module_end_idx = len(children) + + # 在模块范围内找最后一个表格 + last_table_idx = -1 + for i in range(module_start_idx, module_end_idx): + if children[i].tag.endswith('}tbl'): + last_table_idx = i + + return last_table_idx + + def create_item_table(self, doc: Document, item: Dict, clinical_en: str = "", clinical_cn: str = "") -> any: + """ + 创建单个检测项目的表格 + + Args: + doc: Word文档对象 + item: 项目数据 + clinical_en: 英文临床意义 + clinical_cn: 中文临床意义 + + Returns: + 创建的表格元素 + """ + # 创建表格(4行6列) + table = doc.add_table(rows=4, cols=6) + table.alignment = WD_TABLE_ALIGNMENT.CENTER + table.autofit = False + + # 设置列宽 + widths = [Cm(2.5), Cm(3.5), Cm(2.5), Cm(2.5), Cm(2.5), Cm(2.5)] + for row in table.rows: + for idx, width in enumerate(widths): + row.cells[idx].width = width + + def set_font(run, bold=False, font_size=10.5): + run.bold = bold + run.font.name = 'Times New Roman' + run.font.size = Pt(font_size) + run._element.rPr.rFonts.set(qn('w:eastAsia'), '宋体') + + # Row 0: 空行(顶部边框) + row0 = table.rows[0] + row0.height = Cm(0.05) + for cell in row0.cells: + cell.text = '' + + # Row 1: 表头 + header_row = table.rows[1] + headers = [ + ('Abb', '简称'), ('Project', '项目'), ('Result', '结果'), + ('Point', '提示'), ('Refer', '参考'), ('Unit', '单位') + ] + for idx, (en, cn) in enumerate(headers): + p = header_row.cells[idx].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + run = p.add_run(f'{en}\n{cn}') + set_font(run, bold=True, font_size=9) + + # Row 2: 数据行 + data_row = table.rows[2] + + # ABB + p = data_row.cells[0].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + run = p.add_run(item.get('abb', '')) + set_font(run, bold=True) + + # 项目名 + p = data_row.cells[1].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + run = p.add_run(item.get('project', '')) + set_font(run, bold=True) + + # 结果 + p = data_row.cells[2].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + run = p.add_run(str(item.get('result', ''))) + set_font(run) + + # Point + p = data_row.cells[3].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + run = p.add_run(item.get('point', '')) + set_font(run) + + # 参考范围 + p = data_row.cells[4].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + run = p.add_run(clean_reference_range(item.get('reference', ''))) + set_font(run, font_size=9) + + # 单位 + p = data_row.cells[5].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + run = p.add_run(item.get('unit', '')) + set_font(run, font_size=9) + + # Row 3: 临床意义(合并单元格) + sig_row = table.rows[3] + top_cell = sig_row.cells[0] + for i in range(1, 6): + top_cell.merge(sig_row.cells[i]) + + p = top_cell.paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.LEFT + + if clinical_en: + run = p.add_run('Clinical Significance: ') + set_font(run, bold=True, font_size=9) + run = p.add_run(clinical_en) + set_font(run, font_size=9) + run = p.add_run('\n') + + if clinical_cn: + run = p.add_run('临床意义:') + set_font(run, bold=True, font_size=9) + run = p.add_run(clinical_cn) + set_font(run, font_size=9) + + # 设置边框 + self._set_table_borders(table) + + return table._tbl + + def _set_table_borders(self, table): + """设置表格边框样式""" + def set_cell_border(cell, **kwargs): + tc = cell._tc + tcPr = tc.get_or_add_tcPr() + tcBorders = OxmlElement('w:tcBorders') + for edge in ['top', 'left', 'bottom', 'right']: + if edge in kwargs: + element = OxmlElement(f'w:{edge}') + element.set(qn('w:val'), kwargs[edge].get('val', 'single')) + element.set(qn('w:sz'), str(kwargs[edge].get('sz', 4))) + element.set(qn('w:color'), kwargs[edge].get('color', '000000')) + tcBorders.append(element) + tcPr.append(tcBorders) + + border_solid = {'val': 'single', 'sz': 4, 'color': '000000'} + border_dashed = {'val': 'dashed', 'sz': 4, 'color': 'AAAAAA'} + + for i, row in enumerate(table.rows): + for cell in row.cells: + top = border_solid if i == 0 else border_dashed + set_cell_border(cell, top=top, bottom=border_dashed, + left=border_dashed, right=border_dashed) + cell.vertical_alignment = 1 + + def insert_extra_items_to_doc(self, doc_path: str, classified_items: Dict[str, List[Dict]], + explanations: Dict[str, Dict[str, str]] = None) -> str: + """ + 将额外项目插入到文档对应模块末尾 + + Args: + doc_path: 文档路径 + classified_items: {模块名: [项目列表]} + explanations: {ABB: {"clinical_en": "...", "clinical_cn": "..."}} + + Returns: + 处理后的文档路径 + """ + if not classified_items: + print(" 没有额外项目需要插入") + return doc_path + + explanations = explanations or {} + + doc = Document(doc_path) + body = doc._body._body + + inserted_count = 0 + + for module_name, items in classified_items.items(): + if not items: + continue + + print(f" 处理模块 [{module_name}]: {len(items)} 个项目") + + # 找到模块位置 + insert_pos = self.find_module_position(doc, module_name) + + if insert_pos < 0: + print(f" ⚠️ 未找到模块 [{module_name}],跳过") + continue + + # 为每个项目创建表格并插入 + for item in items: + abb = item.get('abb', '').upper() + exp = explanations.get(abb, {}) + clinical_en = exp.get('clinical_en', '') + clinical_cn = exp.get('clinical_cn', '') + + # 创建表格 + table_elem = self.create_item_table(doc, item, clinical_en, clinical_cn) + + # 插入到指定位置后面 + children = list(body) + if insert_pos < len(children): + children[insert_pos].addnext(table_elem) + insert_pos += 1 # 更新位置,下一个表格插入到这个后面 + inserted_count += 1 + print(f" ✓ 插入 {abb}") + + # 保存文档 + doc.save(doc_path) + print(f" ✓ 共插入 {inserted_count} 个额外项目表格") + + return doc_path + + +def process_extra_items(extracted_items: List[Dict], doc_path: str, api_key: str = None) -> str: + """ + 处理额外项目的主函数 + + Args: + extracted_items: OCR提取的所有项目 + doc_path: 已填充的文档路径 + api_key: DeepSeek API Key + + Returns: + 处理后的文档路径 + """ + print("\n" + "=" * 60) + print("处理额外检测项目(模板中没有的项目)") + print("=" * 60) + + handler = ExtraItemsHandler(api_key) + + # 1. 识别额外项目 + extra_items = handler.identify_extra_items(extracted_items) + + if not extra_items: + print(" 没有额外项目需要处理") + return doc_path + + print(f"\n 额外项目列表:") + for item in extra_items: + print(f" - {item.get('abb', '')}: {item.get('project', '')} = {item.get('result', '')}") + + # 2. 使用DeepSeek分类 + print("\n 正在分类...") + classified_items = handler.classify_items_with_deepseek(extra_items) + + if classified_items: + print(f"\n 分类结果:") + for module, items in classified_items.items(): + print(f" [{module}]: {[item.get('abb', '') for item in items]}") + + # 3. 生成临床意义 + print("\n 正在生成临床意义...") + explanations = handler.generate_clinical_significance(extra_items) + + # 4. 插入到文档 + print("\n 正在插入表格...") + result_path = handler.insert_extra_items_to_doc(doc_path, classified_items, explanations) + + print("\n" + "=" * 60) + print("额外项目处理完成") + print("=" * 60) + + return result_path + + +if __name__ == "__main__": + # 测试 + import os + + api_key = os.getenv("DEEPSEEK_API_KEY", "") + + # 加载提取数据 + extracted_file = Path(__file__).parent / "extracted_medical_data.json" + if extracted_file.exists(): + with open(extracted_file, 'r', encoding='utf-8') as f: + data = json.load(f) + + items = data.get('items', data) if isinstance(data, dict) else data + + handler = ExtraItemsHandler(api_key) + extra_items = handler.identify_extra_items(items) + + print(f"\n识别到 {len(extra_items)} 个额外项目:") + for item in extra_items: + print(f" - {item.get('abb', '')}: {item.get('project', '')}") + + if extra_items and api_key: + classified = handler.classify_items_with_deepseek(extra_items) + print(f"\n分类结果:") + for module, items in classified.items(): + print(f" [{module}]: {[item.get('abb', '') for item in items]}") diff --git a/backend/extract_and_fill_report.py b/backend/extract_and_fill_report.py new file mode 100644 index 0000000..cb49e5e --- /dev/null +++ b/backend/extract_and_fill_report.py @@ -0,0 +1,6489 @@ +""" +从医疗报告PDF中提取数据,匹配模板结构,填入Word模板 +""" +import sys +import io +import os + +# 修复Windows终端中文编码问题 +if sys.platform == 'win32': + # 设置环境变量强制UTF-8 + os.environ['PYTHONIOENCODING'] = 'utf-8' + # 设置控制台代码页为UTF-8 + os.system('chcp 65001 >nul 2>&1') + # 重新配置stdout/stderr + if hasattr(sys.stdout, 'buffer'): + sys.stdout.reconfigure(encoding='utf-8', errors='replace') + sys.stderr.reconfigure(encoding='utf-8', errors='replace') + +import fitz +import json +import re +import time +import requests +import base64 +from pathlib import Path +from docx import Document +from docx.shared import Pt, Cm, Inches +from docx.enum.text import WD_ALIGN_PARAGRAPH +from docx.enum.table import WD_TABLE_ALIGNMENT +from docx.oxml.ns import qn +from docx.oxml import OxmlElement +from copy import deepcopy +from dotenv import load_dotenv + +# 加载.env环境变量 +load_dotenv(Path(__file__).parent / ".env") + +# 导入优化版解析函数 +from parse_medical_v2 import parse_medical_data_v2, clean_extracted_data_v2 + + +def find_health_program_boundary(doc): + """ + 动态查找"客户健康方案/Client Health Program"在文档中的位置 + 返回该元素在body.children中的索引,作为保护边界 + + 保护边界之前的所有内容(前四页)不应被修改 + """ + body = doc.element.body + children = list(body) + + for i, elem in enumerate(children): + # 获取元素的文本内容 + text = ''.join(elem.itertext()).strip() + + # 查找"客户健康方案"或"Client Health Program" + if '客户健康方案' in text or 'Client Health Program' in text: + print(f" [保护] 找到保护边界: 位置 {i}, 内容: {text[:50]}...") + # 返回 i+1,这样保护区域包括 "Client Health Program" 本身 + return i + 1 + + # 如果没找到,返回默认值(约80个元素,对应前四页) + print(f" [保护] 未找到'客户健康方案',使用默认边界: 80") + return 80 + + +def find_examination_file_region(doc): + """ + 查找"客户功能医学检测档案/Client Functional Medical Examination File"区域的位置 + 返回 (start_index, end_index) 元组,表示该区域的起始和结束位置 + + 这个区域在尿液检测模块之前,包含客户信息和体检信息,需要保护不被删除 + """ + body = doc.element.body + children = list(body) + + start_idx = -1 + end_idx = -1 + + for i, elem in enumerate(children): + text = ''.join(elem.itertext()).strip() + + # 查找"客户功能医学检测档案"标题 + if '功能医学检测档案' in text or 'Functional Medical Examination File' in text: + start_idx = i + print(f" [保护] 找到'客户功能医学检测档案'区域起始: 位置 {i}") + + # 查找"尿液检测"标题作为结束边界 + if start_idx >= 0 and ('尿液检测' in text or 'Urine Detection' in text): + end_idx = i + print(f" [保护] 找到'客户功能医学检测档案'区域结束: 位置 {i}") + break + + if start_idx >= 0 and end_idx < 0: + # 如果找到了起始但没找到结束,使用起始位置+20作为结束 + end_idx = start_idx + 20 + print(f" [保护] 未找到结束边界,使用默认: {end_idx}") + + return (start_idx, end_idx) + + +def copy_protected_region_from_template(template_path, output_path, boundary): + """ + 从模板复制保护区域到输出文件(简化版) + + 策略: + 1. 复制模板的前 boundary 个元素(前四页) + 2. 从处理后文件中提取数据部分(从 Client Health Program 之后开始) + 3. 不再额外复制"客户功能医学检测档案"区域(已在步骤3-7中处理) + """ + import zipfile + import shutil + from lxml import etree + import os + + if boundary <= 0: + print(" [保护] 边界无效,跳过复制") + return + + temp_output = str(output_path) + ".temp_output" + temp_result = str(output_path) + ".temp_result" + + try: + shutil.copy(output_path, temp_output) + + ns = {'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'} + w_ns = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}' + + with zipfile.ZipFile(template_path, 'r') as z: + template_xml = z.read('word/document.xml') + template_tree = etree.fromstring(template_xml) + template_body = template_tree.find('.//w:body', ns) + + with zipfile.ZipFile(temp_output, 'r') as z: + output_xml = z.read('word/document.xml') + output_tree = etree.fromstring(output_xml) + output_body = output_tree.find('.//w:body', ns) + + if template_body is None or output_body is None: + print(" [保护] 无法找到 body 元素") + return + + template_children = list(template_body) + output_children = list(output_body) + + print(f" [保护] 模板元素: {len(template_children)}, 处理后元素: {len(output_children)}") + + # 在处理后文件中找到数据内容的起始位置 + output_start = -1 + for i, elem in enumerate(output_children): + text = ''.join(elem.itertext()).strip() + if 'Client Health Program' in text or '客户健康方案' in text: + output_start = i + 1 + print(f" [保护] 找到 Client Health Program 位置: {i}") + break + + if output_start < 0: + output_start = boundary + print(f" [保护] 使用默认起始位置: {output_start}") + else: + print(f" [保护] 数据起始位置: {output_start}") + + # 清空模板body,重新构建 + for elem in list(template_body): + template_body.remove(elem) + + # 读取原始模板 + with zipfile.ZipFile(template_path, 'r') as z: + orig_template_xml = z.read('word/document.xml') + orig_template_tree = etree.fromstring(orig_template_xml) + orig_template_body = orig_template_tree.find('.//w:body', ns) + orig_template_children = list(orig_template_body) + + # 1. 添加模板的前 boundary 个元素(前四页) + added_count = 0 + for i in range(min(boundary, len(orig_template_children))): + elem = orig_template_children[i] + if elem.tag.endswith('}sectPr'): + continue + elem_copy = etree.fromstring(etree.tostring(elem)) + template_body.append(elem_copy) + added_count += 1 + + print(f" [保护] 已添加模板前 {added_count} 个元素") + + # 获取模板的 sectPr(包含页脚引用) + sectPr = None + for elem in orig_template_children: + if elem.tag.endswith('}sectPr'): + sectPr = etree.fromstring(etree.tostring(elem)) + print(f" [保护] 使用模板的 sectPr(包含页脚引用)") + break + + # 2. 添加处理后文件的数据内容部分 + data_count = 0 + for i in range(output_start, len(output_children)): + elem = output_children[i] + if elem.tag.endswith('}sectPr'): + continue + elem_copy = etree.fromstring(etree.tostring(elem)) + template_body.append(elem_copy) + data_count += 1 + + print(f" [保护] 已添加 {data_count} 个数据元素") + + # 3. 添加 sectPr 元素 + if sectPr is not None: + template_body.append(sectPr) + + print(f" [保护] 合并后总元素: {len(list(template_body))}") + + # 保存修改后的 XML + new_xml = etree.tostring(template_tree, xml_declaration=True, encoding='UTF-8', standalone='yes') + + with zipfile.ZipFile(template_path, 'r') as zin: + with zipfile.ZipFile(temp_result, 'w', zipfile.ZIP_DEFLATED) as zout: + for item in zin.infolist(): + if item.filename == 'word/document.xml': + zout.writestr(item, new_xml) + else: + zout.writestr(item, zin.read(item.filename)) + + shutil.move(temp_result, output_path) + print(f" [保护] ✓ 前四页保护完成") + + except Exception as e: + print(f" [保护] 复制失败: {e}") + import traceback + traceback.print_exc() + finally: + for f in [temp_output, temp_result]: + if os.path.exists(f): + try: + os.remove(f) + except: + pass + + +def fix_footer_reference(template_path, output_path): + """ + 修复页脚引用,确保所有页面都有 Be.U Med logo + + 问题:在处理过程中,包含 sectPr 的段落可能被删除或修改,导致页脚引用丢失 + 解决:从模板复制第一个 sectPr 的 footerReference 到输出文件的 sectPr 中 + """ + import zipfile + import shutil + from lxml import etree + import os + + ns = {'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main', + 'r': 'http://schemas.openxmlformats.org/officeDocument/2006/relationships'} + w_ns = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}' + r_ns = '{http://schemas.openxmlformats.org/officeDocument/2006/relationships}' + + try: + # 读取模板的 document.xml + with zipfile.ZipFile(template_path, 'r') as z: + template_xml = z.read('word/document.xml') + template_tree = etree.fromstring(template_xml) + template_body = template_tree.find('.//w:body', ns) + + # 找到模板中第一个有 footerReference 的 sectPr + template_sectPrs = template_body.findall('.//w:sectPr', ns) + footer_ref = None + header_refs = [] + + for sectPr in template_sectPrs: + for child in sectPr: + if 'footerReference' in child.tag: + footer_ref = etree.fromstring(etree.tostring(child)) + print(f" [页脚] 找到模板页脚引用: {child.get(r_ns + 'id')}") + if 'headerReference' in child.tag: + header_refs.append(etree.fromstring(etree.tostring(child))) + if footer_ref is not None: + break + + if footer_ref is None: + print(" [页脚] 模板中没有找到页脚引用,跳过") + return + + # 读取输出文件的 document.xml + with zipfile.ZipFile(output_path, 'r') as z: + output_xml = z.read('word/document.xml') + output_tree = etree.fromstring(output_xml) + output_body = output_tree.find('.//w:body', ns) + + # 找到输出文件中的 sectPr(通常在 body 的最后) + output_sectPr = None + for elem in reversed(list(output_body)): + if elem.tag.endswith('}sectPr'): + output_sectPr = elem + break + + if output_sectPr is None: + print(" [页脚] 输出文件中没有找到 sectPr,跳过") + return + + # 检查是否已经有 footerReference + has_footer = False + for child in output_sectPr: + if 'footerReference' in child.tag: + has_footer = True + break + + if has_footer: + print(" [页脚] 输出文件已有页脚引用,跳过") + return + + # 在 sectPr 的开头插入 headerReference 和 footerReference + # 顺序很重要:headerReference 在前,footerReference 在后 + insert_pos = 0 + for header_ref in header_refs: + output_sectPr.insert(insert_pos, header_ref) + insert_pos += 1 + output_sectPr.insert(insert_pos, footer_ref) + + print(f" [页脚] 已添加页脚引用到输出文件") + + # 保存修改后的 XML + new_xml = etree.tostring(output_tree, xml_declaration=True, encoding='UTF-8', standalone='yes') + + # 更新输出文件 + temp_result = str(output_path) + '.temp_footer.docx' + with zipfile.ZipFile(output_path, 'r') as zin: + with zipfile.ZipFile(temp_result, 'w', zipfile.ZIP_DEFLATED) as zout: + for item in zin.infolist(): + if item.filename == 'word/document.xml': + zout.writestr(item, new_xml) + else: + zout.writestr(item, zin.read(item.filename)) + + # 替换输出文件 + shutil.move(temp_result, output_path) + print(f" [页脚] ✓ 页脚修复完成") + + except Exception as e: + print(f" [页脚] 修复失败: {e}") + import traceback + traceback.print_exc() + + +def backup_protected_region(doc): + """ + 备份保护区域的所有XML元素(深拷贝) + 返回:(边界位置, 备份的元素列表) + + 重要:备份的是XML元素的深拷贝,可以在文档修改后恢复 + """ + boundary = find_health_program_boundary(doc) + if boundary <= 0: + print(f" [保护] 未找到保护边界,跳过备份") + return -1, [] + + body = doc.element.body + children = list(body) + backup = [] + for i in range(boundary): + backup.append(deepcopy(children[i])) + + print(f" [保护] 已备份保护区域:boundary={boundary}, backup_len={len(backup)}") + return boundary, backup + + +def restore_protected_region(doc, boundary, backup): + """ + 恢复保护区域的所有XML元素 + + 重要:这个函数会完全替换文档开头的元素,确保保护区域完全恢复 + 使用深拷贝确保元素可以正确插入到新文档中 + """ + if boundary <= 0 or not backup: + print(f" [保护] 跳过恢复:boundary={boundary}, backup_len={len(backup) if backup else 0}") + return + + body = doc.element.body + children = list(body) + + print(f" [保护] 开始恢复保护区域:boundary={boundary}, backup_len={len(backup)}, current_children={len(children)}") + + # 删除当前保护区域的所有元素(从后往前删除,避免索引变化问题) + elements_to_remove = children[:min(boundary, len(children))] + for elem in reversed(elements_to_remove): + try: + body.remove(elem) + except Exception as e: + print(f" [保护] 删除元素失败: {e}") + + # 在开头插入备份的元素(从后往前插入到位置0,这样顺序正确) + # 使用深拷贝确保元素可以正确插入到新文档中 + for elem in reversed(backup): + try: + elem_copy = deepcopy(elem) + body.insert(0, elem_copy) + except Exception as e: + print(f" [保护] 插入元素失败: {e}") + + print(f" [保护] 恢复完成,当前children数量: {len(list(body))}") + + +def set_cell_border(cell, **kwargs): + """设置单元格边框""" + tc = cell._tc + tcPr = tc.get_or_add_tcPr() + tcBorders = OxmlElement('w:tcBorders') + for edge in ['top', 'left', 'bottom', 'right']: + if edge in kwargs: + element = OxmlElement(f'w:{edge}') + element.set(qn('w:val'), kwargs[edge].get('val', 'single')) + element.set(qn('w:sz'), str(kwargs[edge].get('sz', 4))) + element.set(qn('w:color'), kwargs[edge].get('color', '000000')) + tcBorders.append(element) + tcPr.append(tcBorders) + + +# 配对项目定义 - 这些项目应该在同一个表格中显示(两行数据,共享临床意义) +# 格式: 基础项 -> (配对项, 基础项中文名, 配对项中文名) +PAIRED_ITEMS = { + 'NEUT': ('NEUT%', '中性粒细胞数量', '中性粒细胞百分含量'), + 'EOS': ('EOS%', '嗜酸细胞数量', '嗜酸细胞百分含量'), + 'BAS': ('BAS%', '嗜碱细胞数量', '嗜碱细胞百分含量'), + 'LYMPH': ('LYMPH%', '淋巴细胞数量', '淋巴细胞百分含量'), + 'MONO': ('MONO%', '单核细胞数量', '单核细胞百分含量'), + 'TOTAL RBC': ('RBC COUNT', '红细胞总数', '红细胞计数'), +} + +# 反向映射 - 百分比项 -> 基础项 +PAIRED_ITEMS_REVERSE = {v[0]: k for k, v in PAIRED_ITEMS.items()} + +# 所有配对项目的ABB集合(用于跳过单独处理) +ALL_PAIRED_ABBS = set(PAIRED_ITEMS.keys()) | set(PAIRED_ITEMS_REVERSE.keys()) + + +def get_paired_item(abb): + """ + 获取配对项目信息 + 返回: (paired_abb, is_base, base_cn, percent_cn) + 如果没有配对项目,返回 (None, None, None, None) + """ + abb_upper = abb.upper().strip() + + # 检查是否是基础项 + if abb_upper in PAIRED_ITEMS: + percent_abb, base_cn, percent_cn = PAIRED_ITEMS[abb_upper] + return (percent_abb, True, base_cn, percent_cn) + + # 检查是否是百分比项 + if abb_upper in PAIRED_ITEMS_REVERSE: + base_abb = PAIRED_ITEMS_REVERSE[abb_upper] + _, base_cn, percent_cn = PAIRED_ITEMS[base_abb] + return (base_abb, False, base_cn, percent_cn) + + return (None, None, None, None) + + +def is_paired_item(abb): + """检查是否是配对项目(基础项或百分比项)""" + return abb.upper().strip() in ALL_PAIRED_ABBS + + +def is_paired_base_item(abb): + """检查是否是配对项目的基础项(如NEUT, EOS等)""" + return abb.upper().strip() in PAIRED_ITEMS + + +def is_paired_percent_item(abb): + """检查是否是配对项目的百分比项(如NEUT%, EOS%等)""" + return abb.upper().strip() in PAIRED_ITEMS_REVERSE + + +def clean_reference_range(reference: str) -> str: + """ + 清理参考范围格式: + 1. 去掉括号 + 2. 将 "3.5-5.5" + - "<0.2" -> "0-0.2" + - "≤10" -> "0-10" + - "(阴性)" -> "阴性" + """ + import re + + if not reference: + return reference + + ref = reference.strip() + + # 去掉各种括号 + if ref.startswith('(') and ref.endswith(')'): + ref = ref[1:-1] + elif ref.startswith('(') and ref.endswith(')'): + ref = ref[1:-1] + elif ref.startswith('[') and ref.endswith(']'): + ref = ref[1:-1] + + # 处理只有括号开头的情况 + if ref.startswith('('): + ref = ref[1:] + if ref.endswith(')'): + ref = ref[:-1] + if ref.startswith('('): + ref = ref[1:] + if ref.endswith(')'): + ref = ref[:-1] + + ref = ref.strip() + + # 将 list: + """使用百度OCR高精度+位置版提取PDF,返回带位置信息的结果 + + Args: + pdf_path: PDF文件路径 + max_retries: 每页最大重试次数(网络失败时) + """ + global ACCESS_TOKEN + if not ACCESS_TOKEN: + ACCESS_TOKEN = get_access_token() + if not ACCESS_TOKEN: + print(" ❌ 获取access_token失败") + return [] + + doc = fitz.open(pdf_path) + all_items = [] # 带位置的文本块 + failed_pages = [] # 记录失败的页面 + + url = f"https://aip.baidubce.com/rest/2.0/ocr/v1/accurate?access_token={ACCESS_TOKEN}" + + print(f" PDF共 {len(doc)} 页") + + def ocr_single_page(page_idx, retry_count=0): + """OCR单页,支持重试""" + page = doc[page_idx] + pix = page.get_pixmap(dpi=150) + img_data = pix.tobytes('png') + + try: + img_base64 = base64.b64encode(img_data).decode() + data = {"image": img_base64} + response = requests.post(url, data=data, timeout=30) + result = response.json() + + if 'words_result' in result: + page_items = [] + for item in result['words_result']: + page_items.append({ + 'text': item['words'], + 'location': item.get('location', {}), + 'page': page_idx + 1 + }) + print(f" 第 {page_idx+1} 页: {len(result['words_result'])} 行") + return page_items, True + elif 'error_code' in result: + error_code = result['error_code'] + error_msg = result.get('error_msg', '') + # 网络相关错误码,需要重试 + network_errors = [18, 19, 100, 110, 111, 282000, 282003, 282004] + if error_code in network_errors and retry_count < max_retries: + print(f" 第 {page_idx+1} 页网络错误 ({error_code}),{retry_count+1}/{max_retries} 次重试...") + time.sleep(2 * (retry_count + 1)) # 递增等待时间 + return ocr_single_page(page_idx, retry_count + 1) + else: + print(f" 第 {page_idx+1} 页错误: {error_code} - {error_msg}") + return [], False + else: + print(f" 第 {page_idx+1} 页: 未知响应格式") + return [], False + + except requests.exceptions.Timeout: + if retry_count < max_retries: + print(f" 第 {page_idx+1} 页超时,{retry_count+1}/{max_retries} 次重试...") + time.sleep(2 * (retry_count + 1)) + return ocr_single_page(page_idx, retry_count + 1) + else: + print(f" 第 {page_idx+1} 页超时,已达最大重试次数") + return [], False + + except requests.exceptions.ConnectionError: + if retry_count < max_retries: + print(f" 第 {page_idx+1} 页连接失败,{retry_count+1}/{max_retries} 次重试...") + time.sleep(3 * (retry_count + 1)) + return ocr_single_page(page_idx, retry_count + 1) + else: + print(f" 第 {page_idx+1} 页连接失败,已达最大重试次数") + return [], False + + except Exception as e: + if retry_count < max_retries: + print(f" 第 {page_idx+1} 页异常 ({e}),{retry_count+1}/{max_retries} 次重试...") + time.sleep(2 * (retry_count + 1)) + return ocr_single_page(page_idx, retry_count + 1) + else: + print(f" 第 {page_idx+1} 页异常: {e}") + return [], False + + # 第一轮:处理所有页面 + for page_idx in range(len(doc)): + page_items, success = ocr_single_page(page_idx) + if success: + all_items.extend(page_items) + else: + failed_pages.append(page_idx) + time.sleep(0.3) + + # 第二轮:重试失败的页面 + if failed_pages: + print(f"\n ⚠️ {len(failed_pages)} 页提取失败,进行第二轮重试...") + time.sleep(5) # 等待一段时间后重试 + + still_failed = [] + for page_idx in failed_pages: + print(f" 重试第 {page_idx+1} 页...") + page_items, success = ocr_single_page(page_idx) + if success: + all_items.extend(page_items) + else: + still_failed.append(page_idx + 1) # 转为1-based页码 + time.sleep(1) + + if still_failed: + print(f"\n ❌ 以下页面提取失败(可能需要手动检查): {still_failed}") + else: + print(f" ✓ 所有失败页面重试成功") + + doc.close() + return all_items + + +def group_by_rows(items: list, y_threshold: int = 15) -> list: + """按Y坐标分组,识别同一行的数据""" + if not items: + return [] + + # 按页和Y坐标排序 + sorted_items = sorted(items, key=lambda x: (x['page'], x['location'].get('top', 0))) + + rows = [] + current_row = [] + last_page = -1 + last_top = -100 + + for item in sorted_items: + page = item['page'] + top = item['location'].get('top', 0) + + # 换页或Y坐标差距大于阈值,开始新行 + if page != last_page or abs(top - last_top) > y_threshold: + if current_row: + # 按X坐标排序同一行的数据 + current_row.sort(key=lambda x: x['location'].get('left', 0)) + rows.append(current_row) + current_row = [item] + last_page = page + last_top = top + else: + current_row.append(item) + + if current_row: + current_row.sort(key=lambda x: x['location'].get('left', 0)) + rows.append(current_row) + + return rows + + +def extract_pdf_text(pdf_path: str) -> str: + """兼容旧接口 - 返回纯文本""" + items = extract_pdf_with_position(pdf_path) + rows = group_by_rows(items) + lines = [] + for row in rows: + line = " ".join([item['text'] for item in row]) + lines.append(line) + return "\n".join(lines) + + +def extract_patient_info(ocr_text: str) -> dict: + """ + 从OCR文本中提取患者基本信息 + + 提取字段: + - name: 姓名 + - gender: 性别(Male→男性, Female→女性) + - age: 年龄(提取数字部分) + - nation: 国籍(默认"中国",OCR中通常没有) + - exam_time: 体检时间(Collected Date) + - project: 体检项目(功能医学检测套餐) + - report_time: 报告时间(使用当前时间) + + Returns: + dict: 包含患者基本信息的字典 + """ + from datetime import datetime + + info = { + 'name': '', + 'gender': '', + 'age': '', + 'nation': '中国', # 默认值 + 'exam_time': '', + 'project': '功能医学检测套餐', # 固定值 + 'report_time': datetime.now().strftime('%Y-%m-%d') # 当前时间 + } + + lines = ocr_text.split('\n') + + # ---------- 中文体检报告格式检测 ---------- + # 格式: "姓名 姚友胜 性别男 体检单号1125041700091 年龄59" + for line in lines[:20]: + if '姓名' in line and ('性别' in line or '年龄' in line): + # 提取姓名 + name_m = re.search(r'姓名\s*(\S+)', line) + if name_m: + raw = name_m.group(1) + # 去掉姓名后面粘连的 "性别" 等 + raw = re.split(r'性别|年龄|体检', raw)[0] + if raw: + info['name'] = raw + # 提取性别 + gender_m = re.search(r'性别\s*(男|女)', line) + if gender_m: + info['gender'] = '男性' if gender_m.group(1) == '男' else '女性' + # 提取年龄 + age_m = re.search(r'年龄\s*(\d+)', line) + if age_m: + info['age'] = age_m.group(1) + # 提取体检单号中的日期 (格式: 1125041700091 -> 前缀(11)+年(25)+月(04)+日(17)+序号) + id_m = re.search(r'体检单号\s*(\d+)', line) + if id_m: + id_str = id_m.group(1) + if len(id_str) >= 8: + yy = id_str[2:4] + mm = id_str[4:6] + dd = id_str[6:8] + try: + y, m, d = int(yy), int(mm), int(dd) + if 1 <= m <= 12 and 1 <= d <= 31: + info['exam_time'] = f'20{yy}-{mm}-{dd}' + except (ValueError, TypeError): + pass + break # 找到中文患者行后不再继续 + + # ---------- 中文报告体检日期补充 ---------- + for line in lines[:50]: + if '检查日期' in line and not info['exam_time']: + date_m = re.search(r'(\d{4}[-/]\d{1,2}[-/]\d{1,2})', line) + if date_m: + info['exam_time'] = date_m.group(1) + + # ---------- 英文报告格式 ---------- + for line in lines: + line_lower = line.lower().strip() + + # 提取姓名 - Patient Name: MR. SHUNHU YU 或 Patient Name: MS. XXX + if 'patient name' in line_lower: + # 匹配 "Patient Name: XXX" 或 "Patient Name : XXX" + match = re.search(r'patient\s*name\s*[:\:]\s*(.+)', line, re.IGNORECASE) + if match: + name = match.group(1).strip() + # 去掉 MR. / MS. / MRS. 等称谓 + name = re.sub(r'^(MR\.|MS\.|MRS\.|MISS\.?)\s*', '', name, flags=re.IGNORECASE) + info['name'] = name.strip() + + # 提取性别 - Sex : Male 或 Sex : Female + if 'sex' in line_lower and ('male' in line_lower or 'female' in line_lower): + if 'female' in line_lower: + info['gender'] = '女性' + elif 'male' in line_lower: + info['gender'] = '男性' + + # 提取年龄 - Age : 57Y6M17D 或 Age : 35 + if 'age' in line_lower and ':' in line or ':' in line: + match = re.search(r'age\s*[:\:]\s*(\d+)', line, re.IGNORECASE) + if match: + info['age'] = match.group(1) + + # 提取体检时间 - Collected Date/Time: 20 Dec 2025 或 Collected Date : 2025-07-20 + if 'collected' in line_lower and ('date' in line_lower or 'time' in line_lower): + # 匹配日期格式:20 Dec 2025 或 2025-07-20 或 2025/07/20 + match = re.search(r'collected\s*(?:date)?(?:/time)?\s*[:\:]\s*(.+?)(?:\s+\d{1,2}[:\:]\d{2})?$', line, re.IGNORECASE) + if match: + date_str = match.group(1).strip() + # 尝试解析不同的日期格式 + try: + # 格式: 20 Dec 2025 + parsed = datetime.strptime(date_str, '%d %b %Y') + info['exam_time'] = parsed.strftime('%Y-%m-%d') + except: + try: + # 格式: 2025-07-20 + parsed = datetime.strptime(date_str, '%Y-%m-%d') + info['exam_time'] = parsed.strftime('%Y-%m-%d') + except: + try: + # 格式: 2025/07/20 + parsed = datetime.strptime(date_str, '%Y/%m/%d') + info['exam_time'] = parsed.strftime('%Y-%m-%d') + except: + # 保留原始格式 + info['exam_time'] = date_str + + return info + + +def fill_patient_info_in_template(doc, patient_info: dict): + """ + 在Word模板中填充患者基本信息 + + 模板中有两处需要填充: + 1. 第一处(约段落83-94):可能有示例数据,需要替换 + 2. 第二处(约段落263-274):空白占位符,需要填充 + + 使用固定格式确保 / 符号对齐(所有 / 在同一列) + + Args: + doc: python-docx Document对象 + patient_info: 患者信息字典 + """ + # 定义字段前缀(使用固定宽度格式确保 / 对齐) + # 英文部分用空格填充到相同宽度,确保 / 在同一列 + # 最长的英文是 "Project"(7字符),统一填充到7字符 + field_formats = { + 'Name': ('Name / 姓名 :', patient_info.get('name', '')), + 'Gender': ('Gender / 性别 :', patient_info.get('gender', '')), + 'Age': ('Age / 年龄 :', patient_info.get('age', '')), + 'Nation': ('Nation / 国籍 :', patient_info.get('nation', '')), + 'Time / 体检': ('Time / 体检时间 :', patient_info.get('exam_time', '')), + 'Project': ('Project / 体检项目 :', patient_info.get('project', '')), + 'Time / 报告': ('Time / 报告时间 :', patient_info.get('report_time', '')), + } + + filled_count = 0 + + for para in doc.paragraphs: + text = para.text.strip() + + # 检查每个字段 + for field_key, (field_format, value) in field_formats.items(): + # 检查段落是否包含该字段的关键词 + if field_key in text: + # 只有当值不为空时才替换 + if value: + # 清空段落内容 + for run in para.runs: + run.text = '' + + # 添加新内容(使用固定格式) + new_text = field_format + value + if para.runs: + para.runs[0].text = new_text + else: + para.add_run(new_text) + + filled_count += 1 + print(f" ✓ 填充: {field_format}{value}") + break # 一个段落只匹配一个字段 + + print(f" 共填充 {filled_count} 个患者信息字段") + return filled_count + + +def parse_medical_data(text: str, source_file: str) -> list: + """从OCR文本中解析医疗检测数据 - OCR每个字段分行""" + items = [] + lines = [l.strip() for l in text.split('\n') if l.strip()] + + # 项目名称到ABB的映射 - 注意优先级:更具体的放前面 + name_to_abb = { + # 血常规 - 按优先级排序,更具体的放前面 + 'mean cell hb concentration': 'MCHC', 'mchc': 'MCHC', # 必须在 hemoglobin 前 + 'follicle stimulating': 'FSH', 'fsh': 'FSH', 'folicle stimulating': 'FSH', # 必须在 hemoglobin 前 + 'mean corpuscular hemoglobin concentration': 'MCHC', + 'mean corpuscular hemoglobin': 'MCH', + 'rbc distribution width': 'RDW', 'rdw': 'RDW', # 必须在 rbc 前 + 'red cell distribution width': 'RDW', + 'total wbc': 'WBC', 'white blood cell': 'WBC', 'wbc': 'WBC', + 'red blood cell': 'RBC', 'rbc count': 'RBC', 'total rbc': 'RBC', + 'hemoglobin(hb)': 'Hb', 'hemoglobin': 'Hb', # 注意:不要用 'hb' 作为key,会匹配到其他项 + 'hematocrit': 'HCT', 'hct': 'HCT', + 'mean cell volume': 'MCV', 'mcv': 'MCV', 'mean corpuscular volume': 'MCV', + 'platelet count': 'PLT', 'platelet': 'PLT', 'plt': 'PLT', + 'mean platelet volume': 'MPV', 'mpv': 'MPV', + 'neutrophil': 'NEUT', 'neut': 'NEUT', + 'lymphocyte': 'LYMPH', 'lymph': 'LYMPH', + 'monocyte': 'MONO', 'mono': 'MONO', + 'eosinophil': 'EOS', 'eos': 'EOS', + 'basophil': 'BAS', 'bas': 'BAS', + 'esr': 'ESR', 'erythrocyte sedimentation': 'ESR', + 'glucose(fasting)': 'FPG', 'fasting glucose': 'FPG', 'glucose': 'GLU', 'glu': 'GLU', + 'hba1c': 'HbA1c', 'glycated hemoglobin': 'HbA1c', 'haemoglobin a1c': 'HbA1c', 'haemoglobin alc': 'HbA1c', 'hemoglobin a1c': 'HbA1c', + # 血脂 - HDL必须在cholesterol前面,否则会被匹配为TC + 'hdl-cholesterol': 'HDL', 'hdl cholesterol': 'HDL', 'hdl': 'HDL', + 'ldl-cholesterol': 'LDL', 'ldl cholesterol': 'LDL', 'ldl direct': 'LDL', 'ldl': 'LDL', + 'vldl-cholesterol': 'VLDL', 'vldl': 'VLDL', + 'total cholesterol': 'TC', 'cholesterol': 'TC', # 放在HDL/LDL后面 + 'triglyceride': 'TG', 'tg': 'TG', + 'alt': 'ALT', 'sgpt': 'ALT', 'alanine aminotransferase': 'ALT', + 'ast': 'AST', 'sgot': 'AST', 'aspartate aminotransferase': 'AST', + 'gamma glutamyl transferase': 'GGT', 'gamma gt': 'GGT', 'gamma-gt': 'GGT', 'ggt': 'GGT', 'ggt(': 'GGT', + 'alp': 'ALP', 'alkaline phosphatase': 'ALP', + 'total bilirubin': 'TBIL', 'bilirubin total': 'TBIL', 'bilirubin(total)': 'TBIL', + 'direct bilirubin': 'DBIL', 'bilirubin(direct)': 'DBIL', 'bilirubin direct': 'DBIL', + 'ldh': 'LDH', 'lactate dehydrogenase': 'LDH', + 'inr': 'INR', + 'beta crosslap': 'CTX', 'beta-crosslap': 'CTX', + 'anion gap': 'AG', + 'estimated average glucose': 'EAG', + 'total protein': 'TP', + 'albumin': 'ALB', 'alb': 'ALB', + 'globulin': 'GLB', + 'bun': 'BUN', 'urea nitrogen': 'BUN', 'blood urea nitrogen': 'BUN', + 'carcinoembryonic': 'CEA', 'cea': 'CEA', 'carcinoembryonic antigen': 'CEA', + 'uric acid': 'UA', 'uricacid': 'UA', 'ua': 'UA', 'uric acid.': 'UA', + 'egfr': 'eGFR', + 'tsh': 'TSH', 'thyroid stimulating': 'TSH', + 'ft3': 'FT3', 'free t3': 'FT3', + 'ft4': 'FT4', 'free t4': 'FT4', + 't3': 'T3', 't4': 'T4', + 'estrogen': 'E2', 'estradiol': 'E2', 'estradiol(e2)': 'E2', + 'progesterone': 'PROG', + 'testosterone': 'TESTO', + 'fsh': 'FSH', 'lh': 'LH', + 'cortisol': 'Cortisol', + 'igf-1': 'IGF-1', 'igf1': 'IGF-1', + 'dhea': 'DHEA', 'dhea-s': 'DHEA-S', + 'prolactin': 'PRL', + 'afp': 'AFP', 'alpha fetoprotein': 'AFP', + 'cea': 'CEA', + 'ca125': 'CA125', 'ca 125': 'CA125', + 'ca153': 'CA153', 'ca 15-3': 'CA153', 'carbohydrate antigen 15-3': 'CA153', 'carbohydrate antigen 15': 'CA153', + 'ca199': 'CA199', 'ca 19-9': 'CA199', 'carbohydrate antigen 19-9': 'CA199', 'carbohydrate antigen 19': 'CA199', + 'psa': 'PSA', + 'hepatitis b surface antigen': 'HBsAg', 'hbsag': 'HBsAg', 'hbs ag': 'HBsAg', + 'hepatitis b surface antibody': 'HBsAb', 'hbsab': 'HBsAb', 'anti-hbs': 'HBsAb', 'hbs ab': 'HBsAb', + 'hepatitis be antigen': 'HBeAg', 'hbeag': 'HBeAg', 'hbe ag': 'HBeAg', + 'hepatitis be antibody': 'HBeAb', 'hbeab': 'HBeAb', 'hbe ab': 'HBeAb', + + # 尿检项目 + 'ph': 'PH', 'acidity': 'PH', + 'specific gravity': 'SG', 'sp gravity': 'SG', + 'transparency': 'Clarity', 'clear': 'Clarity', + 'glucose': 'GLU', 'glu': 'GLU', + 'ketone': 'KET', 'ket': 'KET', 'ketones': 'KET', + 'bilirubin': 'BIL', 'bil': 'BIL', + 'urobilinogen': 'URO', 'uro': 'URO', + 'nitrite': 'NIT', 'nit': 'NIT', + 'leukocyte': 'LEU', 'leu': 'LEU', 'leucocyte': 'LEU', + 'erythrocyte': 'ERY', 'ery': 'ERY', + 'color': 'Color', 'colour': 'Color', + 'clarity': 'Clarity', 'turbidity': 'Clarity', 'appearance': 'Clarity', + 'bacteria': 'BAC', 'bact': 'BAC', + 'mucus': 'MUC', + 'yeast': 'Yeast', + 'crystal': 'CRY', + 'hepatitis b core antibody': 'HBcAb', 'hbcab': 'HBcAb', 'anti-hbc': 'HBcAb', 'hbc ab': 'HBcAb', + 'hepatitis c antibody': 'Anti-HCV', 'anti-hcv': 'Anti-HCV', 'hcv ab': 'Anti-HCV', + 'hiv': 'HIV', + 'h.pylori': 'H.pylori IgG', 'h. pylori': 'H.pylori IgG', 'helicobacter': 'H.pylori IgG', + 'calcium': 'Ca', # 移除 'ca' 避免误匹配 clinical, context等 + 'phosphorus': 'P', 'phosphate': 'P', + 'iron': 'Fe', 'serum iron': 'Fe', + 'ferritin': 'Ferritin', + 'zinc': 'Zn', 'zn': 'Zn', + 'copper': 'Cu', 'cu': 'Cu', + 'magnesium': 'Mg', 'mg': 'Mg', + 'vitamin b12': 'VitB12', 'vit b12': 'VitB12', 'b12': 'VitB12', + 'folate': 'Folate', 'folic acid': 'Folate', + 'vitamin d': '25-OH-VitD', '25-oh vitamin d': '25-OH-VitD', '25-hydroxy': '25-OH-VitD', 'vitamin d total': '25-OH-VitD', + 'crp': 'CRP', 'c-reactive protein': 'CRP', + 'hs-crp': 'hs-CRP', + 'rf': 'RF', 'rheumatoid factor': 'RF', + 'ana': 'ANA', 'antinuclear antibody': 'ANA', + 'immunoglobulin g': 'IgG', 'immunoglobulin a': 'IgA', 'immunoglobulin m': 'IgM', 'immunoglobulin e': 'IgE', + 'igg': 'IgG', 'iga': 'IgA', 'igm': 'IgM', 'ige': 'IgE', + 'c3': 'C3', 'c4': 'C4', + 'nk cell': 'NK', 'cd16': 'NK', 'cd56': 'NK', + 'osteocalcin': 'OSTE', + 'p1np': 'P1NP', + 'ctx': 'CTX', + 'pth': 'PTH', + 'color': 'Color', 'colour': 'Color', + 'abo group': 'ABO', 'abo blood group': 'ABO', + 'rh group': 'Rh', 'rh blood group': 'Rh', + 'ph': 'pH', + 'specific gravity': 'SG', 'sp gravity': 'SG', 'sg': 'SG', + 'lipoprotein(a)': 'LP(A)', 'lipoprotein a': 'LP(A)', + 'apolipoprotein a1': 'APOA1', 'apolipoprotein a': 'APOA1', + 'apolipoprotein b': 'APOB', + 'protein': 'PRO', + 'ketone': 'KET', 'ket': 'KET', + 'nitrite': 'NIT', 'nit': 'NIT', + 'bilirubin': 'BIL', + 'urobilinogen': 'URO', + 'leukocyte': 'LEU', + # 凝血功能 + 'prothrombin time': 'PT', 'pt': 'PT', 'prothrombin time(pt)': 'PT', + 'thrombin time': 'TT', 'tt': 'TT', 'thrombin time(tt)': 'TT', + 'fibrinogen': 'FIB', 'fibrinogen level': 'FIB', + 'd-dimer': 'D-Dimer', 'fdp d-dimer': 'D-Dimer', + 'aptt': 'APTT', 'activated partial thromboplastin': 'APTT', + # 电解质 + 'sodium': 'Na', 'na': 'Na', + 'potassium': 'K', 'k': 'K', + 'chloride': 'Cl', 'cl': 'Cl', + 'tco2': 'TCO2', 'co2': 'TCO2', + # 同型半胱氨酸 + 'homocysteine': 'HCY', 'hcy': 'HCY', + # 重金属 + 'lead': 'Pb', 'lead in blood': 'Pb', + 'chromium': 'Cr', 'chromium in blood': 'Cr', + 'manganese': 'Mn', 'manganese in blood': 'Mn', + 'nickel': 'Ni', 'nickel in blood': 'Ni', + # 肿瘤标志物 + 'nse': 'NSE', 'neuron specific enolase': 'NSE', + 'cyfra': 'CYFRA21-1', 'cyfra 21-1': 'CYFRA21-1', + # 血脂比值 + 'cholesterol/hdl-c ratio': 'TC/HDL', 'cholesterol/hdl ratio': 'TC/HDL', 'tc/hdl': 'TC/HDL', + 'ldl/hdl ratio': 'LDL/HDL', 'ldl/hdl': 'LDL/HDL', + # 心肌酶 + 'ck-mb': 'CK-MB', 'ckmb': 'CK-MB', 'creatine kinase-mb': 'CK-MB', + 'creatine kinase': 'CK', 'ck': 'CK', + # 甲状腺 + 'total t4': 'T4', 'totalt4': 'T4', 'thyroxine(t4)': 'T4', + # 炎症 + 'aso': 'ASO', 'anti-streptolysin': 'ASO', 'anti streptolysin': 'ASO', 'aso(anti-streptolysin': 'ASO', + # 自身抗体 + 'anti smith': 'Anti-Sm', 'anti-sm': 'Anti-Sm', + 'anti-n rnp': 'Anti-RNP', 'anti rnp': 'Anti-RNP', + } + + # OCR数据格式多样: + # 格式1: 项目名...: \n 数值 \n 单位 \n (参考范围) + # 格式2: 项目名...: \n 数值 H/L 单位 \n (参考范围) + # 格式3: 项目名...: \n 数值H% \n (参考范围) + + # 跳过关键词 - 注意避免误匹配(如 'tel' 会匹配 'platelet') + skip_words = ['page ', 'patient name', 'doctor:', 'laboratory', 'specimen.', 'specimen type', + 'collected date', 'printed', 'method:', 'bangkok', 'thailand', + 'tel.', 'tel(', 'fax.', 'fax-', 'email:', 'iso 15189', 'iso15189', + 'accreditation', 'lab no.', 'lab no:', 'labno', 'mrn.', 'mrn:', 'requested date', + 'received date', 'address/', 'sex :', 'sex:', 'age :', 'age:', + 'dob :', 'dob:', 'ref.no', 'copyright', 'reported by', 'authorised by', + 'print date', 'remark:', 'remark(', 'confidential', 'this report', + 'reference range', 'test name', 'result unit', 'edta blood', + 'morphology:', 'morphology.', 'adequate', 'differential count', + 'complete blood count', 'issue date', 'revision', 'normal range', + 'for 10-year', 'this equation', 'calculated by', 'outlab', + 'approved by', 'trimester', 'women(', 'female 21', 'post-menopause', + 'cytoplasmic', 'oct1114', 'comment:', 'comment.', 'secs', + 'report by', 'method:', 'method.', 'age:', 'age .', 'dr:', 'dr.', + 'age...', # 移除了尿检项目过滤词: transparency, erythrocyte.., leucocyte.., urobilinogen.. + # 过滤噪音数据 - 参考范围和标注被误识别 + 'borderline high', 'borderline low', + 'female 12-', 'male 12-', 'female 14-', 'male 14-', 'female 15-', 'male 15-', + 'female 16-', 'male 16-', 'female 17-', 'male 17-', 'female 18-', 'male 18-', + 'female years', 'male years', 'thai male', 'thai female', + 'serum am', 'serum pm', 'years 501', 'years 508', 'years 1717', + 'years 546', 'years 468', 'years 231', 'years 225', + 'scc 0', 'high =', 'low =', 'age = ', 'rbc = 0', 'high = 160', + 'bilirubin = negative', 'bilirubin negative'] + + # 按key长度排序,最长的优先匹配 + sorted_keys = sorted(name_to_abb.keys(), key=len, reverse=True) + + # 需要精确匹配的短key(避免误匹配) + # alt会误匹配cobalt/totalt4, ast会误匹配contrast等 + exact_match_keys = {'ph', 'sg', 'ca', 'mg', 'na', 'k', 'cl', 'p', 'fe', 'zn', 'cu', 'ni', 'cr', 'mn', 'pb', + 'alt', 'ast', 'ggt', 'alp', 'ldh', 'bun', 'ua', 'tg', 'tc', 't3', 't4', 'fsh', 'lh', + 'hb', 'rbc', 'wbc', 'plt', 'mcv', 'mch', 'hct', 'rdw', 'mpv', + 'crp', 'rf', 'ana', 'pth', 'nse', 'cea', 'afp', 'psa', 'hiv'} + + def find_abb(project_name): + """查找项目对应的ABB""" + pl = project_name.lower().strip() + + # 对于短key,要求精确匹配或单词边界匹配 + for key in sorted_keys: + if key in exact_match_keys: + # 精确匹配:项目名就是key,或者key是独立单词 + if pl == key or re.match(rf'^{key}[\s\.\:\d]', pl) or re.search(rf'\b{key}\b', pl): + return name_to_abb[key] + else: + if key in pl: + return name_to_abb[key] + # 生成ABB + words = [w for w in project_name.split() if len(w) > 0 and w[0].isalpha()] + if words: + return ''.join([w[0].upper() for w in words])[:6] + return project_name[:6].upper() + + def parse_value_line(text): + """解析数值行,返回 (result, point, unit)""" + text = text.strip() + result, point, unit = None, '', '' + + # 格式1: "5.7H%" 或 "140H" 或 "230 H mg/dL" 或 "95" (数值开头) + m = re.match(r'^([\d\.]+)\s*([HL])?\s*(.*)$', text, re.IGNORECASE) + if m: + result = m.group(1) + if m.group(2): + point = '↑' if m.group(2).upper() == 'H' else '↓' + unit = m.group(3).strip() if m.group(3) else '' + return result, point, unit + + # 格式2: 数值和单位合并 "158.00mg/dL" 或 "247.00mg/dL" + m = re.match(r'^([\d\.]+)([a-zA-Z/%]+[/\w]*)$', text) + if m: + result = m.group(1) + unit = m.group(2) + return result, '', unit + + # 格式3: 定性结果 - 单字母血型(A/B/O/AB)或单词(Positive/Negative/Reactive等) + # 支持后面有额外内容如 "Yellow [Normal: Yellow]" + qualitative_patterns = [ + r'^([ABO]|AB)\b', # 血型 + r'^(Positive|Negative|Reactive|Non[- ]?[Rr]eactive|Normal|Abnormal|Adequate|Yellow|Clear|Straw|Amber)\b', # 定性结果 + ] + for pat in qualitative_patterns: + m = re.match(pat, text, re.IGNORECASE) + if m: + result = m.group(1) + return result, '', '' + + # 格式4: 点号后跟数值 "......... 6.0 (4.5-8.0)" -> 提取6.0 + m = re.match(r'^[\.:\s]+([<>]?\d+\.?\d*)\s*(.*)$', text) + if m: + result = m.group(1) + unit = m.group(2).strip() + return result, '', unit + + return result, point, unit + + i = 0 + while i < len(lines): + line = lines[i].strip() + line_lower = line.lower() + + # 跳过无关行 + if any(w in line_lower for w in skip_words): + i += 1 + continue + + # 跳过空行 + if len(line) == 0: + i += 1 + continue + + # 检查是否是项目名行 (包含 ... 或以 : 结尾) + # 支持中文冒号 : 和英文冒号 : + # 增强:支持特定的已知项目名,即使没有冒号 + known_short_projects = ['ph', 'sg', 'pro', 'glu', 'nit', 'ket', 'bld', 'ery', 'leu', 'wbc', 'rbc', 'color', 'turbidity'] + + # 1. 标准格式:以冒号或点结尾 + is_standard_project = re.match(r'^[A-Za-z][A-Za-z0-9\s\-\(\)\.]+[\.:\uff1a]+\s*$', line) + + # 1.5 以(*)开头的项目名(如 (*)Thrombin Time)- 不需要冒号结尾 + is_star_project = re.match(r'^\(\*\)([A-Za-z][A-Za-z0-9\s\-]+)$', line) + + # 2. 已知短项目名格式:可能是 "pH" 或 "pH 6.0" 或 "pH ..." + is_known_project = False + first_word = line.split()[0].lower().strip('.:') if line else '' + if first_word in known_short_projects: + is_known_project = True + + if is_standard_project or is_known_project or is_star_project: + # 提取项目名 + if is_standard_project: + project = re.sub(r'[\.:\:]+\s*$', '', line).strip() + project = re.sub(r'\.+', '', project).strip() + # 移除开头的(*) + project = re.sub(r'^\(\*\)', '', project).strip() + elif is_star_project: + # 从(*)开头的行提取项目名 + project = is_star_project.group(1).strip() + else: + # 对于已知项目,可能后面直接跟结果 + parts = line.split(maxsplit=1) + project = parts[0].strip('.:') + # 如果后面有内容,可能是结果 + remaining = parts[1] if len(parts) > 1 else "" + + abb = find_abb(project) + + # 读取后续行获取数值 + result = None + unit = "" + reference = "" + point = "" + + # 如果是已知项目且同一行有内容,尝试直接解析结果 + if is_known_project and 'remaining' in locals() and remaining: + # 尝试解析 remaining + r, p, u = parse_value_line(remaining) + if r: + result = r + point = p + unit = u + + j = i + 1 + # 如果还没有结果,继续往下找 + while j < len(lines) and j < i + 6 and result is None: + next_line = lines[j].strip() + next_lower = next_line.lower() + + # 跳过无关行 + if any(w in next_lower for w in skip_words): + j += 1 + continue + + # 检查是否是新的项目名 + if re.match(r'^[A-Za-z][A-Za-z0-9\s\-\(\)\.]+[\.:\:]+\s*$', next_line): + break + + # 参考范围 (括号包围) - 先检查这个 + if (next_line.startswith('(') or next_line.startswith('<') or + next_line.startswith('>')) and result is not None: + reference = next_line if next_line.startswith('(') else f'({next_line})' + j += 1 + break + + # 尝试解析数值行 + if result is None: + r, p, u = parse_value_line(next_line) + if r: + result = r + point = p if p else point + unit = u if u else unit + j += 1 + continue + + # 单独的单位行 + if re.match(r'^[\*a-zA-Z0-9\^\/\%\-\.]+$', next_line) and not next_line[0].isdigit(): + if not unit: + unit = next_line + j += 1 + continue + + j += 1 + + # 保存结果 - 过滤噪音 + if result and abb: + project_lower = project.lower() + # 过滤噪音项目名和无效结果 + noise_projects = ['age', 'high', 'low', 'a', 'h', 'l', 'clinical info', + 'context', 'guidelines', 'standards', 'personal data', + 'copyright', 'report', 'specimen', 'method'] + noise_patterns = ['female ', 'male ', 'years ', 'handled following', + 'evolving clinical', 'privacy laws'] + is_noise = ( + project_lower in noise_projects or + (project_lower == 'rbc' and result == '0') or + result in ['.', ':', '-', '/'] or # 无效结果 + len(project) > 50 or # 项目名过长肯定是噪音 + any(p in project_lower for p in noise_patterns) + ) + + if not is_noise: + # 白细胞分类项目特殊处理:根据参考范围判断是数量还是百分比 + # 百分比的参考范围通常是 0-100 之间的数值,如 (46.5-75.0) + # 数量的参考范围通常包含 10^3 或 *10 等单位 + wbc_diff_abbs = {'NEUT', 'LYMPH', 'MONO', 'EOS', 'BAS'} + if abb.upper() in wbc_diff_abbs: + is_percentage = False + # 检查单位是否是百分比 + if unit and '%' in unit: + is_percentage = True + # 检查参考范围是否是百分比形式(0-100之间的数值) + elif reference: + ref_match = re.search(r'\(?([\d\.]+)\s*[-–]\s*([\d\.]+)\)?', reference) + if ref_match: + try: + low = float(ref_match.group(1)) + high = float(ref_match.group(2)) + # 如果参考范围在0-100之间,且没有10^3等单位标识,认为是百分比 + if 0 <= low <= 100 and 0 <= high <= 100 and '10^' not in reference and '*10' not in reference: + is_percentage = True + except: + pass + + if is_percentage: + abb = abb.upper() + '%' + # 如果单位为空,添加% + if not unit: + unit = '%' + + items.append({ + 'abb': abb, + 'project': project, + 'result': result, + 'point': point, + 'unit': unit, + 'reference': reference, + 'source': source_file + }) + + i = j + continue + + # 检查定性结果格式: "项目名...: 结果" 或 "项目名..... . 结果" + # 更宽松:项目名后有点(可含空格),匹配定性结果 + match = re.match(r'^(.+?)[\.\s]{2,}[:\:]?\s*(Negative|Positive|Non[- ]?Reactive|Reactive|Normal|B|A|AB|O|Yellow|Clear)\b', line, re.IGNORECASE) + if match: + project = match.group(1).strip() + project = re.sub(r'\.+', '', project).strip() + result = match.group(2).strip() + + # 过滤噪音 - 只过滤明确的噪音 + project_lower = project.lower() + is_noise = ( + project_lower in ['age', 'high', 'low', 'a', 'h', 'l'] or + any(p in project_lower for p in ['female ', 'male ', 'years ']) + ) + + if not is_noise: + abb = find_abb(project) + items.append({ + 'abb': abb, + 'project': project, + 'result': result, + 'point': '', + 'unit': '', + 'reference': '', + 'source': source_file + }) + i += 1 + continue + + # 检查带冒号的行中是否直接包含定性结果(备用匹配) + # 如 "HIV-1/HIV-2 Antibody.....: Non Reactive" + match = re.match(r'^([A-Za-z][A-Za-z0-9\s\-\(\)/\.]+)[:\:]+\s*(Non[- ]?[Rr]eactive|Reactive|Negative|Positive|Yellow|Clear)$', line, re.IGNORECASE) + if match: + project = match.group(1).strip() + project = re.sub(r'\.+', '', project).strip() + result = match.group(2) + abb = find_abb(project) + + items.append({ + 'abb': abb, + 'project': project, + 'result': result, + 'point': '', + 'unit': '', + 'reference': '', + 'source': source_file + }) + i += 1 + continue + + # 检查带点号或冒号的行中是否直接包含数值 + # 如 "ESR 1 Hour ...................: 20 H mm/hr" 或 "pH......... 6.0 (4.5-8.0)" + # 更宽松:项目名后有点(可含空格),结果以数字或<开头 + match = re.match(r'^(.+?)[\.\s]{2,}[:\:]?\s*([<>]?\d+\.?\d*)\s*([HL])?\s*(.*)$', line, re.IGNORECASE) + if match: + project = match.group(1).strip() + project = re.sub(r'\.+', '', project).strip() + result = match.group(2) + point = '↑' if match.group(3) and match.group(3).upper() == 'H' else ('↓' if match.group(3) and match.group(3).upper() == 'L' else '') + rest = match.group(4).strip() if match.group(4) else '' + + # 解析剩余部分获取单位和参考范围 + unit = '' + reference = '' + if rest: + ref_match = re.search(r'\(([^\)]+)\)', rest) + if ref_match: + reference = f'({ref_match.group(1)})' + rest = rest[:ref_match.start()].strip() + unit = rest + + abb = find_abb(project) + + items.append({ + 'abb': abb, + 'project': project, + 'result': result, + 'point': point, + 'unit': unit, + 'reference': reference, + 'source': source_file + }) + i += 1 + continue + + # 备用匹配1: 项目名(括号内容).: 数值 格式 + # 如 "CEA(Carcinoembryonic Antigen).: 1.41" 或 "Vitamin D(25-OH...): 35.00" + match = re.match(r'^([A-Za-z][A-Za-z0-9\s\-]+)\([^\)]+\)[\.:\s]+\s*([<>]?\d+\.?\d*)\s*(.*)$', line) + if match: + project = match.group(1).strip() + result = match.group(2) + rest = match.group(3).strip() + abb = find_abb(project) + unit = '' + reference = '' + if rest: + ref_match = re.search(r'\(([^\)]+)\)', rest) + if ref_match: + reference = f'({ref_match.group(1)})' + rest = rest[:ref_match.start()].strip() + unit = rest + items.append({ + 'abb': abb, 'project': project, 'result': result, + 'point': '', 'unit': unit, 'reference': reference, 'source': source_file + }) + i += 1 + continue + + # 备用匹配2: 连续点号后跟冒号或空格和结果 + # 如 "Color........................ Yellow" 或 "pH......... 6.0" 或 "Specific Gravity..............: 1.030" + match = re.match(r'^([A-Za-z][A-Za-z0-9\s\-/\(\)]*?)\.{3,}[:\s]+(.+)$', line) + if match: + project = match.group(1).strip() + rest = match.group(2).strip() + abb = find_abb(project) + + # 解析rest:可能是 "Yellow [Normal: Yellow]" 或 "6.0 (4.5-8.0)" 或 "1.030 (1.003-1.030)" + result = None + unit = '' + reference = '' + + # 先尝试提取数值 + num_match = re.match(r'^([<>]?\d+\.?\d*)\s*([HL])?\s*(.*)$', rest, re.IGNORECASE) + if num_match: + result = num_match.group(1) + rest2 = num_match.group(3).strip() + ref_match = re.search(r'\(([^\)]+)\)', rest2) + if ref_match: + reference = f'({ref_match.group(1)})' + rest2 = rest2[:ref_match.start()].strip() + unit = rest2 + else: + # 尝试提取定性结果 + qual_match = re.match(r'^(Negative|Positive|Yellow|Clear|Normal|Non[- ]?Reactive|Reactive)\b', rest, re.IGNORECASE) + if qual_match: + result = qual_match.group(1) + + if result and abb: + items.append({ + 'abb': abb, 'project': project, 'result': result, + 'point': '', 'unit': unit, 'reference': reference, 'source': source_file + }) + i += 1 + continue + + i += 1 + + return items + + +def clean_extracted_data(items: list) -> list: + """清洗提取的数据,修复常见OCR解析错误""" + import re + + cleaned = [] + + for item in items: + abb = item.get('abb', '').upper() + result = item.get('result', '') + unit = item.get('unit', '') + project = item.get('project', '') + reference = item.get('reference', '') + + # 1. 过滤明显的噪音数据 + if abb in ['A', 'H', 'L', 'R', 'AGE']: + continue + if project.lower() in ['age', 'high', 'low', 'received', 'collected']: + continue + if 'phase' in project.lower() or 'trimester' in project.lower(): + continue + + # 2. 修复result在unit字段的情况(如Color的Yellow) + if result in ['', '.', '-', '/'] and unit: + # 颜色值 + colors = ['yellow', 'amber', 'straw', 'colorless', 'red', 'brown', 'dark', 'clear'] + for color in colors: + if color in unit.lower(): + result = color.capitalize() + # 从unit中提取参考范围 + if '[' in unit and 'normal' in unit.lower(): + ref_match = re.search(r'\[.*?(\d.*?)\]', unit, re.IGNORECASE) + if ref_match: + reference = ref_match.group(1) + unit = '' + break + + # 定性结果 + qualitative = ['negative', 'positive', 'reactive', 'non-reactive', 'normal'] + for q in qualitative: + if q in unit.lower(): + result = q.capitalize() + unit = '' + break + + # 3. 过滤无效结果 + if result in ['', '.', '-', '/', '00', '99', '999']: + continue + + # 4. 修复unit中包含参考范围的情况 + if unit and ('[' in unit or 'normal' in unit.lower()): + # 提取真正的单位 + unit_match = re.match(r'^([a-zA-Z0-9\^/%\*]+)', unit) + if unit_match: + real_unit = unit_match.group(1) + if len(real_unit) <= 15: + unit = real_unit + else: + unit = '' + else: + unit = '' + + # 5. 修复特定ABB的数据 + # pH应该在4.0-9.0范围 + if abb == 'PH': + try: + val = float(result.replace(',', '.')) + if not (4.0 <= val <= 9.0): + continue + except: + continue + + # SG应该在1.000-1.050范围 + if abb == 'SG': + try: + val = float(result.replace(',', '.')) + if not (1.000 <= val <= 1.050): + continue + except: + continue + + # 6. 更新item + item['result'] = result + item['unit'] = unit + if reference and not item.get('reference'): + item['reference'] = reference + + cleaned.append(item) + + return cleaned + + +def extract_all_pdfs(pdf_dir: str) -> tuple: + """提取目录下所有PDF的数据 + + Returns: + tuple: (all_items, ocr_texts) - 检测项列表和每个PDF的OCR原文字典 + """ + pdf_path = Path(pdf_dir) + pdf_files = list(pdf_path.glob("*.pdf")) + + all_items = [] + ocr_texts = {} # {pdf_name: ocr_text} + + for pdf_file in pdf_files: + print(f"\n📄 处理: {pdf_file.name}") + text = extract_pdf_text(str(pdf_file)) + ocr_texts[pdf_file.name] = text # 保留OCR原文供后续复用 + # 使用优化版解析函数 + items = parse_medical_data_v2(text, pdf_file.name) + print(f" ✓ 提取 {len(items)} 个检测项") + all_items.extend(items) + + # 清洗数据 - 使用优化版清洗函数 + all_items = clean_extracted_data_v2(all_items) + print(f"\n ✓ 清洗后保留 {len(all_items)} 个有效检测项") + + return all_items, ocr_texts + + +def match_with_template(extracted_items: list, template_config: dict) -> dict: + """将提取的数据与模板结构匹配""" + import re + + # 兼容新旧配置格式 + if 'items' in template_config: + # 旧格式 + template_items = template_config['items'] + elif 'modules' in template_config: + # 新格式:从modules中提取所有items + template_items = [] + for module_name, module_data in template_config['modules'].items(): + for item in module_data.get('items', []): + template_items.append({ + 'abb': item.get('abb', ''), + 'project': item.get('project', ''), + 'project_cn': item.get('project_cn', ''), + 'module': module_name + }) + else: + template_items = [] + + # 结果有效性验证规则 + def is_valid_result(abb, result): + """检查结果是否对该项目有效""" + if not result: + return False + result_lower = result.lower().strip() + abb_upper = abb.upper() + + # 定性结果项目 + qualitative = ['PRO', 'GLU', 'KET', 'BIL', 'NIT', 'URO', 'LEU', 'BLD', + 'HBSAG', 'HBSAB', 'HBEAG', 'HBEAB', 'HBCAB', 'ANTI-HCV', 'HIV', 'RPR', + 'ANA', 'ANTI-SM', 'ANTI-RNP', 'RF'] + valid_qualitative = ['negative', 'positive', 'trace', 'normal', 'abnormal', + 'reactive', 'non-reactive', 'nonreactive', 'weak positive', + '1+', '2+', '3+', '4+', '+-'] + + if abb_upper in qualitative: + # 定性结果有效 + if result_lower in valid_qualitative or result_lower.replace('+', '').replace('-', '') in ['1', '2', '3', '4']: + return True + # 数值结果也有效(有些定性项目也有定量结果,如HBsAb抗体滴度) + if re.search(r'\d', result): + return True + return False + + # 血型 + if abb_upper in ['ABO', 'RH']: + return result_lower in ['a', 'b', 'ab', 'o', 'positive', 'negative', 'rh+', 'rh-', '+', '-'] + + # 颜色 + if abb_upper == 'COLOR': + return result_lower in ['yellow', 'amber', 'straw', 'colorless', 'red', 'brown', 'dark'] + + # pH值 + if abb_upper == 'PH': + try: + val = float(result.replace(',', '.')) + return 4.0 <= val <= 9.0 + except: + return False + + # 比重SG + if abb_upper == 'SG': + try: + val = float(result.replace(',', '.')) + return 1.000 <= val <= 1.050 + except: + return False + + # 数值型结果 - 检查是否包含数字 + if re.search(r'\d', result): + # 排除明显错误的值 + if len(result) > 30: # 太长 + return False + if result_lower in ['00', '99', '999']: # 占位符 + return False + return True + + return False + + # 建立ABB索引 + template_by_abb = {} + for item in template_items: + abb = item['abb'].upper() + template_by_abb[abb] = item + # 处理别名 + if '/' in abb: + for part in abb.split('/'): + template_by_abb[part] = item + + # 先按ABB分组提取数据(使用大写作为key进行匹配,但保留原始ABB) + items_by_abb = {} + original_abb_map = {} # 保存原始ABB大小写 + for item in extracted_items: + abb_upper = item['abb'].upper() + original_abb = item['abb'] # 保留原始大小写 + if abb_upper not in items_by_abb: + items_by_abb[abb_upper] = [] + original_abb_map[abb_upper] = original_abb # 记录原始ABB + items_by_abb[abb_upper].append(item) + + matched = {} + unmatched = [] + + for abb_upper, items in items_by_abb.items(): + original_abb = original_abb_map.get(abb_upper, abb_upper) # 获取原始ABB + + # 过滤有效结果 + valid_items = [i for i in items if is_valid_result(abb_upper, i.get('result', ''))] + + if not valid_items: + # 如果没有有效项,使用第一个(可能是定性结果) + valid_items = items[:1] + + # 选择最佳匹配(优先选择有异常标记的,其次是有单位和参考范围的) + best = valid_items[0] + for item in valid_items: + score = 0 + # 异常标记权重最高(+10分) + point = item.get('point', '').strip() + if point in ['↑', '↓', 'H', 'L', '高', '低']: + score += 10 + if item.get('unit'): score += 1 + if item.get('reference'): score += 1 + if item.get('project'): score += 1 + + best_point = best.get('point', '').strip() + best_score = (10 if best_point in ['↑', '↓', 'H', 'L', '高', '低'] else 0) + \ + (1 if best.get('unit') else 0) + \ + (1 if best.get('reference') else 0) + \ + (1 if best.get('project') else 0) + if score > best_score: + best = item + + + # 匹配到模板(使用原始ABB作为key) + if abb_upper in template_by_abb: + # 直接匹配优先 + if original_abb not in matched: # 避免重复覆盖 + # 添加模块信息和中文项目名称 + best['module'] = template_by_abb[abb_upper].get('module', '') + # 使用配置文件中的中文项目名称 + if template_by_abb[abb_upper].get('project_cn'): + best['project_cn'] = template_by_abb[abb_upper]['project_cn'] + matched[original_abb] = best # 使用原始ABB作为key + else: + # 模糊匹配 - 只匹配有意义的相似性,避免'R' in 'COLOR'这种错误 + found = False + for t_abb in template_by_abb: + # 要求至少3个字符匹配,且匹配部分占比高 + if len(abb_upper) >= 3 and len(t_abb) >= 3: + if abb_upper == t_abb: + if original_abb not in matched: + # 添加模块信息和中文项目名称 + best['module'] = template_by_abb[t_abb].get('module', '') + if template_by_abb[t_abb].get('project_cn'): + best['project_cn'] = template_by_abb[t_abb]['project_cn'] + matched[original_abb] = best # 使用原始ABB作为key + found = True + break + if not found: + unmatched.append(best) + + print(f"\n匹配结果: {len(matched)} 个匹配, {len(unmatched)} 个未匹配") + + # 将未匹配的项目也加入结果中,以便后续作为缺失项目处理 + for item in unmatched: + original_abb = item.get('abb', '') # 使用原始ABB + if original_abb and original_abb not in matched: + matched[original_abb] = item + + return matched + + +def remove_placeholder_tables(doc): + """ + 删除原有模板中的数据行(包括占位符行和已填充数据行) + 保留:模块标题行 + 删除:表头行、数据行、Clinical Significance行 + + 注意:模块标题表格最终应该只剩下1行(模块标题行) + """ + import re + removed_count = 0 + + # 模块标题关键词(完整的模块名称) + module_title_patterns = [ + 'blood sugar', 'blood count', 'complete blood count', 'urine detection', 'urine test', + 'liver function', 'kidney function', 'lipid profile', 'lipid panel', + 'thyroid function', 'thyroid', 'tumor marker', 'electrolyte', 'serum electrolyte', + 'coagulation', 'blood coagulation', 'immune', 'humoral immunity', + 'bone metabolism', 'infectious disease', 'four infectious', + 'heavy metal', 'microelement', 'trace element', + 'cardiovascular', 'thromboembolism', 'autoantibody', 'autoimmune', + 'blood type', 'inflammatory', 'lymphocyte', + 'female hormone', 'male hormone', 'female-specific', 'imaging', + 'myocardial enzyme', 'cardiac enzyme', + '血常规', '尿液检测', '肝功能', '肾功能', '血脂', '甲状腺功能', '甲状腺', + '肿瘤标志物', '电解质', '血糖', '凝血功能', '凝血', '体液免疫', '免疫功能', + '骨代谢', '传染病', '重金属', '微量元素', '心脑血管', '自身抗体', + '血型', '炎症', '淋巴细胞', '女性激素', '男性激素', '女性专项', '影像', + '心肌酶', '女性荷尔蒙', '男性荷尔蒙' + ] + + def is_module_title_row(row_text): + """ + 判断是否是真正的模块标题行 + 模块标题行的特征: + 1. 完整的模块名称重复出现多次(如 "Blood Sugar\n血糖 Blood Sugar\n血糖...") + 2. 行文本主要由模块名称组成,没有其他数据内容 + """ + row_text_lower = row_text.lower() + + # 检查是否有完整的模块名称重复出现 + for pattern in module_title_patterns: + count = row_text_lower.count(pattern) + if count >= 3: # 模块标题行通常重复3次以上 + # 额外检查:行文本长度应该与重复的模块名称长度相近 + pattern_total_len = len(pattern) * count + if len(row_text_lower) < pattern_total_len * 3: + return True + return False + + for table in doc.tables: + rows_to_remove = [] + + for row_idx, row in enumerate(table.rows): + row_text = ' '.join([c.text for c in row.cells]).strip() + row_text_lower = row_text.lower() + + # 空行:删除 + if not row_text or row_text.replace(' ', '') == '': + rows_to_remove.append(row) + continue + + # 模块标题行:保留 + if is_module_title_row(row_text): + # 如果包含占位符,清除占位符文本但保留行 + if '{{' in row_text: + placeholder_pattern = re.compile(r'\{\{[^}]*\}\}') + for cell in row.cells: + if '{{' in cell.text: + cell.text = placeholder_pattern.sub('', cell.text).strip() + continue + + # Clinical Significance行:删除(会在后续步骤中重新生成) + if 'clinical significance' in row_text_lower or '临床意义' in row_text: + rows_to_remove.append(row) + continue + + # 其他所有行都删除(包括表头行和数据行) + rows_to_remove.append(row) + + # 删除标记的行 + for row in rows_to_remove: + try: + tbl = table._tbl + tbl.remove(row._tr) + removed_count += 1 + except: + pass + + return removed_count + + +def find_module_title_position(doc, module_name): + """ + 找到模块标题在body中的位置 + 返回模块标题表格的位置,新表格应插入到这个位置之后 + + 注意:模块标题在模板中是表格的第一行,不是段落 + + 关键区分: + - 模块标题表格:标题行是重复的模块名称(如 "Blood Sugar\n血糖 Blood Sugar\n血糖...") + - 数据表格:Clinical Significance 行是长文本描述,可能包含关键词但不是标题 + """ + # 标准模块名称到搜索关键词的映射 + module_titles = { + # 24个标准模块 + 'Urine Test': ['urine test', 'urine detection', '尿液检测', '尿常规'], + 'Complete Blood Count': ['complete blood count', 'cbc', '血常规'], + 'Blood Sugar': ['blood sugar', '糖代谢', '血糖'], + 'Lipid Profile': ['lipid profile', 'lipid panel', '血脂'], + 'Blood Type': ['blood type', '血型'], + 'Blood Coagulation': ['blood coagulation', 'coagulation', '凝血功能', '凝血'], + 'Four Infectious Diseases': ['infectious disease', '传染病', 'four infectious'], + 'Serum Electrolytes': ['serum electrolyte', 'electrolyte', '电解质', '血清电解质'], + 'Liver Function': ['liver function', '肝功能'], + 'Kidney Function': ['kidney function', '肾功能'], + 'Myocardial Enzyme': ['myocardial enzyme', 'cardiac enzyme', '心肌酶', '心肌酶谱'], + 'Thyroid Function': ['thyroid function', '甲状腺功能', '甲功'], + 'Thromboembolism': ['thromboembolism', 'cardiovascular risk', '心脑血管', '血栓'], + 'Bone Metabolism': ['bone metabolism', '骨代谢'], + 'Microelement': ['microelement', 'trace element', 'heavy metal', '微量元素', '重金属'], + 'Lymphocyte Subpopulation': ['lymphocyte subpopulation', 'lymphocyte', '淋巴细胞亚群'], + 'Humoral Immunity': ['humoral immunity', 'immune function', '体液免疫', '免疫功能'], + 'Inflammatory Reaction': ['inflammatory reaction', 'inflammation', '炎症', '血沉'], + 'Autoantibody': ['autoantibody', 'autoimmune', '自身抗体', '自身免疫'], + 'Female Hormone': ['female hormone', '女性激素', '女性荷尔蒙'], + 'Male Hormone': ['male hormone\n男性荷尔蒙', '男性激素', '男性荷尔蒙male hormone'], + 'Tumor Markers': ['tumor marker', '肿瘤标志物'], + 'Imaging': ['imaging', '影像'], + 'Female-specific': ['female-specific', 'gynecological', '妇科', '女性专项'], + } + + titles = module_titles.get(module_name, [module_name.lower()]) + body = doc.element.body + + def is_module_title_row(row_text): + """ + 判断是否是模块标题行(而不是 Clinical Significance 行) + + 模块标题行特征: + 1. 包含重复的模块名称(如 "Blood Sugar\n血糖 Blood Sugar\n血糖...") + 2. 不以 "Clinical Significance" 开头 + 3. 不包含长描述性内容 + """ + row_text_lower = row_text.lower().strip() + + # 排除 Clinical Significance 行 + if row_text_lower.startswith('clinical significance'): + return False + if '临床意义' in row_text and len(row_text) > 100: + return False + + # 检查是否是重复模式的标题行 + # 模块标题行通常是 "Module Name\n中文名 Module Name\n中文名..." 这种重复模式 + for title in titles: + title_lower = title.lower() + # 如果关键词在文本中出现多次(>=2),很可能是标题行 + if row_text_lower.count(title_lower) >= 2: + # 额外检查:排除包含长描述的Clinical Significance行 + # Clinical Significance行通常包含这些描述性词汇 + cs_indicators = ['used to', 'helps to', 'reflects', 'indicates', 'evaluating', + 'diagnosis of', 'marker of', 'assessment', 'screening'] + if any(ind in row_text_lower for ind in cs_indicators) and len(row_text) > 500: + return False + return True + # 如果文本很短且包含关键词,也可能是标题行 + if len(row_text) < 150 and title_lower in row_text_lower: + # 额外检查:排除包含描述性词汇的行 + description_words = ['content', 'level', 'reflects', 'indicates', 'assisting', + 'diagnosis', 'evaluating', 'normal', 'reference'] + if not any(dw in row_text_lower for dw in description_words): + return True + + return False + + # 遍历所有表格找模块标题 + for i, table in enumerate(doc.tables): + if len(table.rows) == 0: + continue + # 只检查前3行 + for row_idx in range(min(3, len(table.rows))): + row_text = ' '.join([c.text.strip() for c in table.rows[row_idx].cells]) + row_text_lower = row_text.lower() + + # 检查是否包含关键词 + if any(title in row_text_lower for title in titles): + # 进一步验证是否是模块标题行 + if is_module_title_row(row_text): + # 找到模块标题,返回该表格在body中的位置 + tbl_element = table._tbl + for idx, child in enumerate(body): + if child is tbl_element: + return idx + + return -1 + + +def detect_gender(matched_data: dict, abb_config: dict) -> str: + """ + 【已弃用】根据匹配到的荷尔蒙项目检测性别 + + 注意:此函数已不再使用。现在统一从OCR文本中提取性别信息(通过patient_info['gender'])。 + 保留此函数仅作为备用参考。 + + 原判断逻辑: + 1. 如果有 AMH(抗缪勒氏管激素)→ 女性(AMH 只在女性荷尔蒙模块中) + 2. 如果有 TPSA/FPSA(前列腺特异性抗原)→ 男性(前列腺是男性特有器官) + 3. 如果有 CA125/CA15-3/SCC(女性肿瘤标志物)→ 女性 + 4. 如果都没有,检查 E2(雌二醇)的值:女性 E2 通常 > 100 pmol/L + + 注意:COR/Cortisol 不参与判断,因为它是需要根据性别分配的项目 + """ + # 获取别名映射 + abb_aliases = abb_config.get('abb_aliases', {}) + + # 标准化 ABB 的辅助函数 + def normalize(abb): + abb_upper = abb.upper().strip() + return abb_aliases.get(abb, abb_aliases.get(abb_upper, abb)).upper() + + # 检查匹配数据中的项目 + has_amh = False # 女性特有 + has_psa = False # 男性特有 + has_female_tumor_markers = False # 女性肿瘤标志物 + e2_value = None # 雌二醇值 + + for abb, data in matched_data.items(): + result = data.get('result', '') + if not result or result in ['', '.', '-', '/']: + continue + + abb_upper = abb.upper().strip() + normalized = normalize(abb) + + # 检查 AMH(女性特有) + if normalized == 'AMH' or abb_upper == 'AMH': + has_amh = True + print(f" 发现 AMH(抗缪勒氏管激素)→ 女性特有项目") + + # 检查 PSA(男性特有) + if normalized in ['TPSA', 'FPSA', 'PSA', 'F/TPSA'] or abb_upper in ['TPSA', 'FPSA', 'PSA', 'F/TPSA']: + has_psa = True + print(f" 发现 {abb}(前列腺特异性抗原)→ 男性特有项目") + + # 检查女性肿瘤标志物 + if normalized in ['CA125', 'CA15-3', 'CA153', 'SCC'] or abb_upper in ['CA125', 'CA15-3', 'CA153', 'SCC']: + has_female_tumor_markers = True + print(f" 发现 {abb}(女性肿瘤标志物)→ 女性特有项目") + + # 记录 E2 值 + if normalized == 'E2' or abb_upper == 'E2': + try: + e2_value = float(result.replace(',', '').strip()) + print(f" 发现 E2(雌二醇)= {e2_value}") + except: + pass + + # 判断性别 + if has_psa: + print(f" ✓ 检测结果: 男性 (发现前列腺特异性抗原)") + return 'male' + + if has_amh or has_female_tumor_markers: + print(f" ✓ 检测结果: 女性 (发现女性特有项目)") + return 'female' + + # 如果有 E2 值,根据数值判断(女性 E2 通常 > 50 pmol/L) + if e2_value is not None: + if e2_value > 50: + print(f" ✓ 检测结果: 女性 (E2 = {e2_value} > 50)") + return 'female' + else: + print(f" ✓ 检测结果: 男性 (E2 = {e2_value} <= 50)") + return 'male' + + # 默认返回女性(因为 COR 原本在女性模块中) + print(f" ✓ 检测结果: 女性 (默认)") + return 'female' + + +def fill_word_template_new(template_path: str, matched_data: dict, output_path: str, api_key: str = None, patient_info: dict = None): + """ + 新版填充逻辑: + 1. 按照2.pdf标准模块顺序和项目顺序排列 + 2. 先删除原有占位符表格行 + 3. 为每个ABB单独创建新表格结构 + 4. 未匹配到标准项目的数据通过DeepSeek分析后添加到对应模块尾部 + + Args: + template_path: Word模板路径 + matched_data: 匹配的数据字典 + output_path: 输出文件路径 + api_key: DeepSeek API密钥(可选) + patient_info: 患者信息字典,包含gender字段(从OCR文本提取) + """ + doc = Document(template_path) + + # 第一步:删除占位符行 + print("\n 🧹 正在删除占位符行...") + removed = remove_placeholder_tables(doc) + print(f" ✓ 已删除 {removed} 个占位符行") + + # 加载配置获取模块信息和标准顺序 + from config import load_abb_config, get_standard_module_order, sort_items_by_standard_order, normalize_abb, normalize_module_name + abb_config = load_abb_config() + abb_to_module = abb_config.get('abb_to_module', {}) + abb_to_info = abb_config.get('abb_to_info', {}) + standard_module_order = get_standard_module_order() + + # 性别检测:从OCR文本中提取的patient_info获取性别 + # 将中文"男性"/"女性"转换为英文"male"/"female" + gender_from_ocr = patient_info.get('gender', '') if patient_info else '' + if gender_from_ocr == '男性': + detected_gender = 'male' + print(f" ✓ 性别: 男性 (从OCR文本提取)") + elif gender_from_ocr == '女性': + detected_gender = 'female' + print(f" ✓ 性别: 女性 (从OCR文本提取)") + else: + # 如果没有从OCR提取到性别,使用默认值(女性) + detected_gender = 'female' + print(f" ⚠️ 未从OCR文本提取到性别,使用默认值: 女性") + + # 根据性别确定荷尔蒙项目应该分配到的模块 + hormone_target_module = 'Male Hormone' if detected_gender == 'male' else 'Female Hormone' + + # 定义所有荷尔蒙相关的ABB(这些项目在男性和女性荷尔蒙模块中都可能出现) + hormone_abbs = { + 'E2', 'PROG', 'FSH', 'LH', 'PRL', 'T', 'DHEAS', 'COR', 'CORTISOL', + 'IGF-1', 'IGF1', 'AMH', 'TESTO' + } + + # 按模块分组所有数据 + by_module = {} + unclassified_items = [] # 无法分类的项目 + config_classified = 0 # 配置文件分类计数 + deepseek_classified = 0 # DeepSeek分类计数 + + print("\n 📂 步骤1: 根据配置文件分类...") + + for abb, data in matched_data.items(): + result = data.get('result', '') + if not result or result in ['', '.', '-', '/']: + continue + + # 标准化ABB名称 + normalized_abb = normalize_abb(abb, abb_config) + + # 特殊处理荷尔蒙项目:根据检测到的性别分配到对应的荷尔蒙模块 + # 注意:必须优先于配置文件映射,确保根据性别正确分配 + abb_upper = abb.upper().strip() + normalized_upper = normalized_abb.upper().strip() + is_hormone_abb = (abb_upper in hormone_abbs or normalized_upper in hormone_abbs) + + # 如果配置文件中有模块映射,检查是否是荷尔蒙模块 + if not is_hormone_abb: + # 先检查配置文件中的模块映射 + module_from_config = abb_to_module.get(normalized_abb, '') + if not module_from_config: + module_from_config = abb_to_module.get(abb, '') + if not module_from_config: + module_from_config = abb_to_module.get(normalized_abb.upper(), '') + if not module_from_config: + module_from_config = abb_to_module.get(abb.upper(), '') + + # 如果配置文件中映射到荷尔蒙模块,也视为荷尔蒙项目 + if module_from_config in ['Male Hormone', 'Female Hormone']: + is_hormone_abb = True + + # 如果是荷尔蒙项目,根据性别分配到对应的模块 + if is_hormone_abb: + # 根据性别确定目标模块:男性→男性荷尔蒙,女性→女性荷尔蒙 + target_module = 'Male Hormone' if detected_gender == 'male' else 'Female Hormone' + if target_module not in by_module: + by_module[target_module] = [] + by_module[target_module].append((abb, data)) + config_classified += 1 + print(f" ✓ {abb} → [{target_module}] (荷尔蒙项目,根据性别: {detected_gender})") + continue + + # 非荷尔蒙项目:使用配置文件中的模块映射 + # 先尝试精确匹配(处理大小写敏感的ABB如TG/Tg) + module = abb_to_module.get(normalized_abb, '') + if not module: + module = abb_to_module.get(abb, '') + # 再尝试大写匹配(向后兼容) + if not module: + module = abb_to_module.get(normalized_abb.upper(), '') + if not module: + module = abb_to_module.get(abb.upper(), '') + + if module: + # 配置文件分类成功 + if module not in by_module: + by_module[module] = [] + by_module[module].append((abb, data)) + config_classified += 1 + else: + # 需要DeepSeek分类 + unclassified_items.append((abb, data)) + + print(f" ✓ 配置文件分类: {config_classified} 个项目") + print(f" ⏳ 待DeepSeek分类: {len(unclassified_items)} 个项目") + + # 使用DeepSeek分类未匹配的项目 + if unclassified_items: + print("\n 🤖 步骤2: 使用DeepSeek分类未匹配项目...") + items_to_remove = [] + for abb, data in unclassified_items: + module = classify_abb_module(abb, data.get('project', abb), api_key) + if module: + # 标准化模块名称 + original_module = module + module = normalize_module_name(module, abb_config) + + # 如果DeepSeek分类结果是荷尔蒙模块,必须根据性别重新分配 + if module in ['Male Hormone', 'Female Hormone']: + # 根据性别确定目标模块:男性→男性荷尔蒙,女性→女性荷尔蒙 + module = 'Male Hormone' if detected_gender == 'male' else 'Female Hormone' + print(f" ✓ {abb} → [{original_module}] → [{module}] (荷尔蒙项目,根据性别: {detected_gender})") + elif original_module != module: + print(f" ✓ {abb} → [{original_module}] → [{module}]") + else: + print(f" ✓ {abb} → [{module}]") + + if module not in by_module: + by_module[module] = [] + by_module[module].append((abb, data)) + deepseek_classified += 1 + items_to_remove.append((abb, data)) + else: + print(f" ✗ {abb} 无法分类") + + # 从未分类列表中移除已分类的项目 + for item in items_to_remove: + unclassified_items.remove(item) + + print(f" ✓ DeepSeek分类: {deepseek_classified} 个项目") + + total_classified = config_classified + deepseek_classified + print(f"\n 📋 分类完成: 共 {total_classified} 个项目,分布在 {len(by_module)} 个模块") + if unclassified_items: + print(f" ⚠️ {len(unclassified_items)} 个项目无法分类: {[i[0] for i in unclassified_items]}") + + # 第三步:按标准模块顺序处理 + added_count = 0 + skipped_modules = [] + + print("\n 📝 步骤3: 按标准顺序填充模块...") + + # 辅助函数:在项目列表中查找配对项目 + def find_paired_item_in_list(items, target_abb): + """在项目列表中查找指定ABB的项目""" + target_upper = target_abb.upper().strip() + for abb, data in items: + if abb.upper().strip() == target_upper: + return (abb, data) + return None + + # 辅助函数:处理模块中的项目(支持配对项目) + def process_module_items(doc, module, sorted_items, position, abb_to_info, abb_config, api_key, gender=None): + """处理模块中的项目,支持配对项目合并显示""" + nonlocal added_count + + insert_pos = position + is_first_item = True + processed_abbs = set() # 记录已处理的ABB + + for abb, data in sorted_items: + abb_upper = abb.upper().strip() + + # 跳过已处理的项目 + if abb_upper in processed_abbs: + continue + + result = data.get('result', '') + point = data.get('point', '') + reference = data.get('reference', '') + unit = data.get('unit', '') + + # 获取项目信息 + normalized_abb = normalize_abb(abb, abb_config) + info = abb_to_info.get(normalized_abb, {}) + if not info: + info = abb_to_info.get(abb, {}) + if not info: + info = abb_to_info.get(normalized_abb.upper(), {}) + if not info: + info = abb_to_info.get(abb.upper(), {}) + # 优先使用配置文件中的中文名称,其次使用data中的project_cn + name = info.get('project_cn') or data.get('project_cn') + # 如果没有中文名称,调用DeepSeek翻译 + if not name: + english_name = info.get('project') or data.get('project', abb) + name = translate_project_name_to_chinese(abb, english_name, api_key) + + # 检查是否是配对项目,并且配对项目是否都存在于数据中 + if is_paired_item(abb): + paired_abb, is_base, base_cn, percent_cn = get_paired_item(abb) + + # 查找配对项目是否存在于当前模块的数据中 + paired_item_data = find_paired_item_in_list(sorted_items, paired_abb) if paired_abb else None + + if paired_item_data: + # 两个配对项目都存在,创建配对表格 + paired_abb_actual, paired_data = paired_item_data + + # 确定基础项和百分比项的ABB和数据 + # 使用原始数据中的ABB(保持PDF中的大小写格式) + if is_base: + # 当前项是基础项 + base_abb_name = abb # 原始ABB + percent_abb_name = paired_abb_actual # 原始配对ABB + base_result = result + base_point = point + base_reference = reference + base_unit = unit + percent_result = paired_data.get('result', '') + percent_point = paired_data.get('point', '') + percent_reference = paired_data.get('reference', '') + percent_unit = paired_data.get('unit', '') + else: + # 当前项是百分比项,配对项是基础项 + base_abb_name = paired_abb_actual # 原始配对ABB + percent_abb_name = abb # 原始ABB + percent_result = result + percent_point = point + percent_reference = reference + percent_unit = unit + base_result = paired_data.get('result', '') + base_point = paired_data.get('point', '') + base_reference = paired_data.get('reference', '') + base_unit = paired_data.get('unit', '') + + # 获取AI解释(使用基础项的信息) + ai_explanation = get_ai_explanation(abb, name, result, api_key, gender=gender) + + try: + # 使用配对表格(两行数据都填入) + insert_paired_items_table_with_both_data( + doc, insert_pos, + base_abb_name, percent_abb_name, + base_cn, percent_cn, + base_result, base_point, base_reference, base_unit, + percent_result, percent_point, percent_reference, percent_unit, + ai_explanation['en'], ai_explanation['cn'], + include_header=is_first_item # 只有模块第一个项目有表头 + ) + added_count += 1 + insert_pos += 2 + is_first_item = False + + # 标记基础项和百分比项都已处理 + processed_abbs.add(abb_upper) + processed_abbs.add(paired_abb.upper().strip()) + print(f" ✓ 配对项目: {base_abb_name} + {percent_abb_name}") + continue + except Exception as e: + print(f" ✗ 添加配对项目 {abb} 失败: {e}") + else: + # 只有一个配对项目存在,使用普通表格 + print(f" ℹ️ 配对项目 {abb} 的配对项 {paired_abb} 不存在,使用普通表格") + + # 普通项目,创建单独表格 + ai_explanation = get_ai_explanation(abb, name, result, api_key, gender=gender) + + try: + insert_table_after_position( + doc, insert_pos, abb, name, result, + ai_explanation['en'], ai_explanation['cn'], + point=point, reference=reference, unit=unit, + include_header=is_first_item # 只有模块第一个项目有表头 + ) + added_count += 1 + insert_pos += 2 + is_first_item = False + processed_abbs.add(abb_upper) + except Exception as e: + print(f" ✗ 添加 {abb} 失败: {e}") + + return insert_pos + + return insert_pos + + # 按标准顺序遍历模块 + for module in standard_module_order: + if module not in by_module: + continue + + items = by_module[module] + + # 按标准项目顺序排序(标准项目在前,非标准项目在后) + sorted_items = sort_items_by_standard_order(items, module, abb_config) + + # 找到该模块标题的位置 + position = find_module_title_position(doc, module) + + if position < 0: + # 找不到模块,跳过 + skipped_modules.append((module, len(items))) + continue + + print(f" 📍 模块 [{module}] 标题位置: {position}, 共 {len(sorted_items)} 个项目") + + # 使用新的处理函数(支持配对项目) + process_module_items(doc, module, sorted_items, position, abb_to_info, abb_config, api_key, gender=detected_gender) + + # 处理不在标准顺序中的模块 + for module, items in by_module.items(): + if module in standard_module_order: + continue # 已处理 + + sorted_items = sort_items_by_standard_order(items, module, abb_config) + position = find_module_title_position(doc, module) + + if position < 0: + skipped_modules.append((module, len(items))) + continue + + print(f" 📍 额外模块 [{module}] 标题位置: {position}, 共 {len(sorted_items)} 个项目") + + # 使用新的处理函数(支持配对项目) + process_module_items(doc, module, sorted_items, position, abb_to_info, abb_config, api_key, gender=detected_gender) + + if skipped_modules: + print(f"\n ⚠️ 跳过的模块(找不到标题):") + for mod, cnt in skipped_modules: + print(f" - {mod}: {cnt} 个项目") + + if unclassified_items: + print(f"\n ⚠️ 无法分类的项目:") + for abb, data in unclassified_items: + print(f" - {abb}: {data.get('result', '')}") + + print(f"\n✓ 已为 {added_count} 个项目创建单独表格") + + # 使用安全保存 + if output_path: + from xml_safe_save import safe_save + safe_save(doc, output_path, template_path) + print(f"✓ 保存到: {output_path}") + + return doc + + +def fill_word_template(template_path: str, matched_data: dict, output_path: str, api_key: str = None, patient_info: dict = None): + """ + 将匹配的数据填入Word模板(兼容旧接口) + + Args: + template_path: Word模板路径 + matched_data: 匹配的数据字典 + output_path: 输出文件路径 + api_key: DeepSeek API密钥(可选) + patient_info: 患者信息字典(可选) + """ + # 直接调用新版函数 + return fill_word_template_new(template_path, matched_data, output_path, api_key, patient_info) + # 默认单位映射 - 用于补充OCR未识别的单位 + default_units = { + # 血常规 + 'WBC': '10^9/L', 'RBC': '10^12/L', 'HB': 'g/L', 'HGB': 'g/L', 'HCT': '%', + 'MCV': 'fL', 'MCH': 'pg', 'MCHC': 'g/L', 'RDW': '%', 'PLT': '10^9/L', + 'NEUT': '%', 'LYMPH': '%', 'MONO': '%', 'EOS': '%', 'BAS': '%', + 'NEUT#': '10^9/L', 'LYMPH#': '10^9/L', 'MONO#': '10^9/L', 'EOS#': '10^9/L', 'BAS#': '10^9/L', + # 肝功能 + 'ALT': 'U/L', 'AST': 'U/L', 'GGT': 'U/L', 'ALP': 'U/L', 'LDH': 'U/L', + 'TBIL': 'μmol/L', 'DBIL': 'μmol/L', 'IBIL': 'μmol/L', + 'TP': 'g/L', 'ALB': 'g/L', 'GLB': 'g/L', + # 肾功能 + 'BUN': 'mmol/L', 'CREA': 'μmol/L', 'UA': 'μmol/L', 'EGFR': 'mL/min/1.73m²', + # 血脂 + 'TC': 'mmol/L', 'TG': 'mmol/L', 'HDL': 'mmol/L', 'LDL': 'mmol/L', + 'APOA1': 'g/L', 'APOB': 'g/L', 'LP(A)': 'mg/L', + # 电解质 + 'NA': 'mmol/L', 'K': 'mmol/L', 'CL': 'mmol/L', 'CA': 'mmol/L', + 'P': 'mmol/L', 'MG': 'mmol/L', 'FE': 'μmol/L', 'ZN': 'μmol/L', 'CU': 'μmol/L', + 'TCO2': 'mmol/L', + # 血糖 + 'GLU': 'mmol/L', 'HBA1C': '%', 'OGTT': 'mmol/L', + # 甲状腺 + 'TSH': 'mIU/L', 'FT3': 'pmol/L', 'FT4': 'pmol/L', 'T3': 'nmol/L', 'T4': 'nmol/L', + # 激素 + 'E2': 'pmol/L', 'PROG': 'nmol/L', 'TESTO': 'nmol/L', 'FSH': 'IU/L', 'LH': 'IU/L', + 'PRL': 'mIU/L', 'CORTISOL': 'nmol/L', 'DHEA-S': 'μmol/L', 'IGF-1': 'ng/mL', + # 肿瘤标志物 + 'AFP': 'ng/mL', 'CEA': 'ng/mL', 'CA125': 'U/mL', 'CA153': 'U/mL', 'CA199': 'U/mL', + 'PSA': 'ng/mL', 'NSE': 'ng/mL', 'CYFRA21-1': 'ng/mL', + # 凝血 + 'PT': 's', 'APTT': 's', 'TT': 's', 'FIB': 'g/L', 'D-DIMER': 'mg/L', 'INR': '', + # 炎症/免疫 + 'CRP': 'mg/L', 'HS-CRP': 'mg/L', 'RF': 'IU/mL', 'ESR': 'mm/h', + 'IGG': 'g/L', 'IGA': 'g/L', 'IGM': 'g/L', 'IGE': 'IU/mL', + 'C3': 'g/L', 'C4': 'g/L', + # 维生素 + 'VITB12': 'pmol/L', 'FOLATE': 'nmol/L', '25-OH-VITD': 'nmol/L', + # 尿常规 - 大部分定性无单位 + 'SG': '', 'PH': '', 'PRO': '', 'GLU': '', 'KET': '', 'BIL': '', 'NIT': '', 'URO': '', 'LEU': '', + # 血型 + 'ABO': '', 'RH': '', + # 传染病 + 'HBSAG': '', 'HBSAB': '', 'HBEAG': '', 'HBEAB': '', 'HBCAB': '', + 'ANTI-HCV': '', 'HIV': '', 'RPR': '', + # 自身抗体 + 'ANA': '', 'ANTI-SM': '', 'ANTI-RNP': '', + # 同型半胱氨酸 + 'HCY': 'μmol/L', + # 骨代谢 + 'OSTE': 'ng/mL', 'P1NP': 'ng/mL', 'CTX': 'ng/mL', 'PTH': 'pg/mL', + } + + # 默认参考范围映射 - 用于补充OCR未识别的参考范围 + default_references = { + # 尿常规定性项目 + 'COLOR': 'Yellow', 'PRO': 'Negative', 'GLU': 'Negative', 'KET': 'Negative', + 'BIL': 'Negative', 'NIT': 'Negative', 'URO': 'Normal', 'LEU': 'Negative', + 'BLD': 'Negative', 'PH': '(4.5-8.0)', 'SG': '(1.003-1.030)', + # 传染病 + 'HBSAG': 'Negative', 'HBSAB': 'Negative/Positive', 'HBEAG': 'Negative', + 'HBEAB': 'Negative', 'HBCAB': 'Negative', 'ANTI-HCV': 'Negative', + 'HIV': 'Non-reactive', 'RPR': 'Non-reactive', + # 自身抗体 + 'ANA': 'Negative', 'ANTI-SM': 'Negative', 'ANTI-RNP': 'Negative', 'RF': 'Negative', + # 血型 + 'ABO': 'A/B/O/AB', 'RH': 'Positive/Negative', + } + + # 表头关键词 - 用于识别真正的表头行 + # 真正的表头行应该同时包含ABB+Project或ABB+Result等组合 + header_core = ['abb', '简称'] # 表头行必须包含这些词之一 + header_extra = ['project', '项目', 'result', '结果', 'refer', '参考', 'unit', '单位'] + + # Word模板ABB别名映射:Word中的格式 -> 提取数据中的ABB + word_abb_aliases = { + # 肿瘤标志物 + 'CA15-3': 'CA153', 'CA19-9': 'CA199', + # 血型 + 'BLOOD TYPE': 'ABO', 'BLOOD TYPE RH': 'Rh', 'ABO': 'ABO', 'RH': 'Rh', + # 电解质 + 'CALCIUM': 'CA', 'MAGNESIUM': 'MG', 'CHLORIDE': 'CL', 'SODIUM': 'NA', 'KALIUM': 'K', + 'PHOSPHORUS': 'P', 'NA': 'NA', 'K': 'K', 'CL': 'CL', 'P': 'P', + # 糖代谢 + 'HBA1C': 'HBA1C', 'FBS': 'GLU', 'FPG': 'FPG', 'EAG': 'EAG', + # 激素 + 'IGF1': 'IGF-1', 'DHEAS': 'DHEA-S', 'DHEA-S': 'DHEA-S', 'COR': 'CORTISOL', 'TESTO': 'TESTO', + # 尿检项目 + 'COLOR': 'COLOR', 'KET': 'KET', 'PRO': 'PRO', 'NIT': 'NIT', + 'PH': 'PH', 'SG': 'SG', 'BLD/ERY': 'ERY', 'CLARITY': 'CLARITY', 'TUR': 'CLARITY', + 'BIL': 'BIL', 'ERY': 'ERY', 'URO': 'URO', 'LEU': 'LEU', + # 血常规 + 'BAS': 'BAS', 'EOS': 'EOS', 'LYMPH': 'LYMPH', 'MONO': 'MONO', 'NEUT': 'NEUT', + 'BAS%': 'BAS', 'EOS%': 'EOS', 'LYMPH%': 'LYMPH', 'MONO%': 'MONO', 'NEUT%': 'NEUT', + 'RBC COUNT': 'RBC', 'WBC COUNT': 'WBC', 'TOTAL RBC': 'RBC', + 'MCH': 'MCH', 'RDW': 'RDW', 'RBC': 'RBC', 'WBC': 'WBC', + # 免疫 + 'ANTI-SM': 'ANTI-SM', 'ANTI-RNP': 'ANTI-RNP', 'ANA': 'ANA', 'ASO': 'ASO', + 'H. PYLORI IGG': 'H.PYLORI', + # 骨代谢 + '25-OH-VD2+D3': '25-OH-VITD', '25-OH-VITD': '25-OH-VITD', + 'Β - CTX': 'CTX', 'CTX': 'CTX', 'TPINP': 'P1NP', 'OST': 'OSTE', + # 同型半胱氨酸 + 'HOMOCYSTEINE': 'HCY', 'HCY': 'HCY', + # 凝血 + 'INR': 'INR', 'D - DIMER': 'D-DIMER', 'D-DIMER': 'D-DIMER', + 'APTT': 'APTT', 'PT': 'PT', 'TT': 'TT', + # 肾功能 + 'SCR': 'CR', 'CR': 'CR', 'UA': 'UA', 'EGFR': 'EGFR', + # 肝功能 + 'DBIL': 'DBIL', 'IBIL': 'IBIL', 'ALB': 'ALB', 'GLB': 'GLB', + # 血脂 + 'TCO2': 'TCO2', 'AG': 'AG', 'VLDL': 'VLDL', 'LP(A)': 'LP(A)', 'LP(A)': 'LP(A)', 'APOB': 'APOB', + # 铁代谢 + 'FER': 'FERRITIN', 'FERRITIN': 'FERRITIN', 'FE': 'FE', + # 前列腺 + 'TPSA': 'PSA', 'PSA': 'PSA', 'FPSA': 'FPSA', + # 肿瘤 + 'CYFRA21-1': 'CYFRA21-1', 'NSE': 'NSE', + # 传染病 + 'HIV': 'HIV', 'RPR': 'RPR', 'ANTI-HCV': 'ANTI-HCV', 'SAPA': 'SAPA', + 'TRUST': 'RPR', 'TPPA': 'TPPA', + # 微量元素 + 'MN': 'MN', 'NI': 'NI', 'MIB': 'MIB', 'CIB': 'CIB', 'ZN': 'ZN', 'CU': 'CU', + # 其他 + 'SEC': 'SEC', 'CRY': 'CRY', 'T4-TOTAL': 'T4', 'T4': 'T4', + } + + # 遍历所有表格 + for table_idx, table in enumerate(doc.tables): + for row_idx, row in enumerate(table.rows): + cells = row.cells + if len(cells) < 2: + continue + + # 获取整行文本用于判断 + row_text = ' '.join([c.text.strip().lower() for c in cells]) + + # 跳过表头行 - 必须同时包含ABB关键词和其他表头词 + is_header = any(kw in row_text for kw in header_core) and any(kw in row_text for kw in header_extra) + if is_header: + continue + + # 跳过标题行 (如 "Complete Blood Count 血常规") + if 'complete blood' in row_text or 'blood count' in row_text: + continue + if 'clinical significance' in row_text: + continue + if '临床意义' in row_text or '检测' in row_text: + continue + + # 获取第一个单元格作为ABB + first_cell_text = cells[0].text.strip() + + # 跳过空行 + if not first_cell_text: + continue + + # ABB应该是短字符串,通常是大写字母组合 + # 跳过太长的或包含中文的 + if len(first_cell_text) > 20: + continue + if any('\u4e00' <= c <= '\u9fff' for c in first_cell_text): + # 第一列包含中文,可能不是ABB列,检查是否是数据行的其他格式 + continue + + abb = first_cell_text.upper() + + # 通过别名映射转换Word模板中的ABB格式 + lookup_abb = word_abb_aliases.get(abb, abb) + + # 构建大小写不敏感的查找表 + matched_data_upper = {k.upper(): v for k, v in matched_data.items()} + + # 查找匹配的数据(大小写不敏感) + data = None + # 优先用别名转换后的ABB查找 + if lookup_abb.upper() in matched_data_upper: + data = matched_data_upper[lookup_abb.upper()] + elif abb in matched_data_upper: + data = matched_data_upper[abb] + else: + # 尝试模糊匹配 - 处理带括号的情况如 "Hemoglobin(Hb)" 匹配 "HB" + for key in matched_data: + key_upper = key.upper() + if abb in key_upper.replace('(', ' ').replace(')', ' ').split(): + data = matched_data[key] + break + + if data: + # 找到匹配数据,标记为已填充(无论是否实际写入) + filled_abbs.add(lookup_abb.upper()) + + # 确定列索引 - 基于模板结构 + # 列0: ABB, 列1-2: Project, 列3-4: Result, 列5-6: Point, 列7-8: Refer, 列9-10: Unit + try: + # 预处理:修复OCR解析错误(结果被放到unit字段的情况) + result_val = data.get('result', '') + unit_val = data.get('unit', '') + + # 如果result无效但unit包含颜色/定性结果,则从unit提取 + if result_val in ['', '.', '-', '/'] and unit_val: + # 检查unit是否包含颜色值 + colors = ['yellow', 'amber', 'straw', 'colorless', 'red', 'brown', 'dark'] + for color in colors: + if color in unit_val.lower(): + result_val = color.capitalize() + unit_val = '' + break + + # 填充Result (列3) + if result_val and result_val not in ['.', '-', '/'] and len(cells) > 3: + # 检查目标单元格是否为空或只包含占位符(包括模板变量{{xxx}}) + current_text = cells[3].text.strip() + is_empty = not current_text or current_text in ['', '-', '/'] or current_text.startswith('{{') + if is_empty: + cells[3].text = str(result_val) + filled_count += 1 + + # 填充Point (列5) + if data.get('point') and len(cells) > 5: + current_text = cells[5].text.strip() + if not current_text or current_text in ['', '-', '/']: + cells[5].text = data['point'] + + # 填充Reference (列7) - 优先使用提取的参考范围,否则使用默认值 + if len(cells) > 7: + current_text = cells[7].text.strip() + if not current_text or current_text in ['', '-', '/']: + ref = data.get('reference', '') + if not ref: + # 使用默认参考范围 + ref = default_references.get(abb, '') + if ref: + cells[7].text = ref + + # 填充Unit (列9) - 优先使用提取的单位,否则使用默认单位 + if len(cells) > 9: + current_text = cells[9].text.strip() + if not current_text or current_text in ['', '-', '/']: + unit = data.get('unit', '') + # 检查unit是否有效(排除混入的参考范围) + if unit: + invalid_unit = ( + len(unit) > 20 or # 单位不应该太长 + 'normal' in unit.lower() or + '[' in unit or ']' in unit or + '(' in unit or ')' in unit or + '-' in unit and any(c.isdigit() for c in unit) # 包含数字范围 + ) + if invalid_unit: + unit = '' + if not unit: + # 使用默认单位 + unit = default_units.get(abb, '') + if unit: + cells[9].text = unit + + except Exception as e: + print(f"Error filling {abb}: {e}") + pass + + # 计算未填充的项目(大小写不敏感比较) + filled_abbs_upper = {a.upper() for a in filled_abbs} + unfilled_abbs = {k for k in matched_data.keys() if k.upper() not in filled_abbs_upper} + + if unfilled_abbs: + print(f"\n 📋 发现 {len(unfilled_abbs)} 个未匹配项目,将添加到报告末尾") + add_missing_items_table(doc, unfilled_abbs, matched_data, api_key) + + cleaned_count = 0 + for table in doc.tables: + for row in table.rows: + for cell in row.cells: + if placeholder_pattern.search(cell.text): + cell.text = placeholder_pattern.sub('', cell.text).strip() + cleaned_count += 1 + if cleaned_count > 0: + print(f" 🧹 清理 {cleaned_count} 个占位符") + + # 保存 + doc.save(output_path) + print(f"\n✓ 已填充 {filled_count} 个数据项") + print(f"✓ 保存到: {output_path}") + + return doc + + +# DeepSeek API配置(优先从.env读取,否则使用备用Key) +DEEPSEEK_API_KEY = os.environ.get('DEEPSEEK_API_KEY', '') or "sk-a8653b2b866b4e26a0dea234a498b1fa" +DEEPSEEK_API_URL = "https://api.deepseek.com/v1/chat/completions" + +# DeepSeek缓存文件路径 +DEEPSEEK_CACHE_FILE = Path(__file__).parent / "deepseek_cache.json" +_deepseek_cache = None # 内存缓存 + +def load_deepseek_cache(): + """加载DeepSeek缓存""" + global _deepseek_cache + if _deepseek_cache is not None: + return _deepseek_cache + + if DEEPSEEK_CACHE_FILE.exists(): + try: + with open(DEEPSEEK_CACHE_FILE, 'r', encoding='utf-8') as f: + _deepseek_cache = json.load(f) + except: + _deepseek_cache = {'classifications': {}, 'explanations': {}} + else: + _deepseek_cache = {'classifications': {}, 'explanations': {}} + return _deepseek_cache + +def save_deepseek_cache(): + """保存DeepSeek缓存""" + global _deepseek_cache + if _deepseek_cache: + with open(DEEPSEEK_CACHE_FILE, 'w', encoding='utf-8') as f: + json.dump(_deepseek_cache, f, ensure_ascii=False, indent=2) + + +def translate_project_name_to_chinese(abb: str, project_name: str, api_key: str = None) -> str: + """ + 将英文项目名称翻译为中文 + + Args: + abb: 项目缩写 + project_name: 英文项目名称 + api_key: DeepSeek API Key + + Returns: + 中文项目名称 + """ + if not project_name or not api_key: + return project_name + + # 检查缓存 + cache = load_deepseek_cache() + if 'translations' not in cache: + cache['translations'] = {} + + cache_key = f"{abb}:{project_name}" + if cache_key in cache['translations']: + return cache['translations'][cache_key] + + # 调用DeepSeek翻译 + prompt = f"""请将以下医学检测项目名称翻译为中文。只返回中文翻译,不要其他内容。 + +项目缩写: {abb} +英文名称: {project_name} + +要求: +1. 使用标准医学术语 +2. 简洁准确 +3. 只返回中文名称,不要其他说明""" + + try: + response = call_deepseek_api(prompt, api_key, max_tokens=100, timeout=30) + if response: + # 清理响应 + cn_name = response.strip() + # 移除可能的引号和多余内容 + cn_name = cn_name.strip('"\'') + if '\n' in cn_name: + cn_name = cn_name.split('\n')[0].strip() + + # 保存到缓存 + cache['translations'][cache_key] = cn_name + save_deepseek_cache() + return cn_name + except Exception as e: + print(f" ⚠️ 翻译 {abb} 失败: {e}") + + return project_name + + +def enhance_data_with_deepseek(matched_data: dict, api_key: str) -> dict: + """ + 使用DeepSeek智能补充数据: + 1. 为没有参考范围的项目补充参考范围(包括定性结果) + 2. 判断没有point标记但可能异常的项目 + + Args: + matched_data: 匹配后的数据字典 + api_key: DeepSeek API Key + + Returns: + 增强后的数据字典 + """ + import json + + # 收集需要处理的项目 + items_need_reference = [] # 需要补充参考范围的项目 + items_need_check = [] # 需要判断是否异常的项目 + + # 定性结果关键词 + qualitative_keywords = ['negative', 'positive', 'non-reactive', 'reactive', + 'normal', 'abnormal', '阴性', '阳性', '正常', '异常', + 'clear', 'cloudy', 'yellow', 'amber', 'trace', 'nil'] + + for abb, data in matched_data.items(): + result = data.get('result', '').strip() + reference = data.get('reference', '').strip() + point = data.get('point', '').strip() + unit = data.get('unit', '').strip() + project = data.get('project', abb) + + # 检查是否是定性结果 + is_qualitative = any(kw in result.lower() for kw in qualitative_keywords) + + # 定性结果没有参考范围,需要补充 + if is_qualitative and not reference: + items_need_reference.append({ + 'abb': abb, + 'project': project, + 'result': result, + 'unit': unit, + 'is_qualitative': True + }) + continue + + # 尝试解析数值结果 + try: + # 处理可能的数值格式 + result_clean = result.replace(',', '').replace(' ', '') + result_value = float(result_clean) + + # 需要补充参考范围 + if not reference: + items_need_reference.append({ + 'abb': abb, + 'project': project, + 'result': result, + 'unit': unit, + 'is_qualitative': False + }) + + # 有参考范围但没有point标记,需要判断是否异常 + if reference and not point: + items_need_check.append({ + 'abb': abb, + 'project': project, + 'result': result, + 'reference': reference, + 'unit': unit + }) + except (ValueError, TypeError): + # 非数值结果且不是已知定性结果,也尝试补充参考范围 + if not reference and result: + items_need_reference.append({ + 'abb': abb, + 'project': project, + 'result': result, + 'unit': unit, + 'is_qualitative': True + }) + + print(f" 需要补充参考范围: {len(items_need_reference)} 个项目") + print(f" 需要判断异常: {len(items_need_check)} 个项目") + + # 1. 补充参考范围 + if items_need_reference: + print(" 正在调用DeepSeek补充参考范围...") + items_desc = [] + for item in items_need_reference[:30]: # 限制数量避免prompt过长 + desc = f"- {item['abb']}: {item['project']}, 结果: {item['result']}" + if item['unit']: + desc += f" {item['unit']}" + if item.get('is_qualitative'): + desc += " (定性检测)" + items_desc.append(desc) + + prompt = f"""你是一位医学检验专家。请为以下检测项目提供标准参考范围。 + +## 检测项目: +{chr(10).join(items_desc)} + +## 要求: +1. 提供成人的标准参考范围 +2. 数值型参考范围格式示例:3.5-5.5、0-10、0-40 +3. 定性检测的参考范围通常是:Negative、Non-Reactive、Normal、Clear 等 +4. 如果不确定,可以返回空字符串 +5. 不要使用 < 或 > 符号,用具体范围表示,如 <5 改为 0-5 + +## 输出格式(JSON): +```json +{{ + "ABB1": "参考范围", + "ABB2": "参考范围" +}} +``` + +只返回JSON,不要其他说明。""" + + try: + response = call_deepseek_api(prompt, api_key, max_tokens=1000, timeout=60) + if response: + # 解析JSON + if '```json' in response: + response = response.split('```json')[1].split('```')[0] + elif '```' in response: + response = response.split('```')[1].split('```')[0] + + references = json.loads(response.strip()) + updated_count = 0 + for abb, ref in references.items(): + # 尝试多种匹配方式 + matched_key = None + if abb in matched_data: + matched_key = abb + elif abb.upper() in matched_data: + matched_key = abb.upper() + elif abb.lower() in matched_data: + matched_key = abb.lower() + + if matched_key and ref: + matched_data[matched_key]['reference'] = ref + updated_count += 1 + print(f" ✓ 已补充 {updated_count} 个项目的参考范围") + except Exception as e: + print(f" ⚠️ 补充参考范围失败: {e}") + + # 2. 判断异常项目 + if items_need_check: + print(" 正在调用DeepSeek判断异常项目...") + items_desc = [] + for item in items_need_check[:30]: # 限制数量 + desc = f"- {item['abb']}: {item['project']}, 结果: {item['result']}, 参考范围: {item['reference']}" + if item['unit']: + desc += f", 单位: {item['unit']}" + items_desc.append(desc) + + prompt = f"""你是一位医学检验专家。请判断以下检测项目的结果是否异常。 + +## 检测项目: +{chr(10).join(items_desc)} + +## 判断规则: +1. 如果结果超出参考范围上限,标记为 "↑"(偏高) +2. 如果结果低于参考范围下限,标记为 "↓"(偏低) +3. 如果结果在参考范围内,标记为 ""(正常,空字符串) +4. 参考范围格式可能是:3.5-5.5、<10、>100、0-40 等 + +## 输出格式(JSON): +```json +{{ + "ABB1": "↑", + "ABB2": "↓", + "ABB3": "" +}} +``` + +只返回JSON,不要其他说明。""" + + try: + response = call_deepseek_api(prompt, api_key, max_tokens=1000, timeout=60) + if response: + # 解析JSON + if '```json' in response: + response = response.split('```json')[1].split('```')[0] + elif '```' in response: + response = response.split('```')[1].split('```')[0] + + abnormal_flags = json.loads(response.strip()) + abnormal_count = 0 + for abb, flag in abnormal_flags.items(): + abb_upper = abb.upper() + if abb_upper in matched_data and flag in ['↑', '↓', 'H', 'L']: + matched_data[abb_upper]['point'] = flag + abnormal_count += 1 + print(f" ✓ {abb_upper}: {flag}") + print(f" ✓ 发现 {abnormal_count} 个新异常项目") + except Exception as e: + print(f" ⚠️ 判断异常失败: {e}") + + return matched_data + + +def call_deepseek_api(prompt: str, api_key: str = None, max_tokens: int = 2000, timeout: int = 120) -> str: + """ + 调用DeepSeek API + """ + key = api_key or DEEPSEEK_API_KEY + if not key: + return None + + headers = { + "Authorization": f"Bearer {key}", + "Content-Type": "application/json" + } + data = { + "model": "deepseek-chat", + "messages": [{"role": "user", "content": prompt}], + "temperature": 0.3, + "max_tokens": max_tokens + } + + try: + response = requests.post(DEEPSEEK_API_URL, headers=headers, json=data, timeout=timeout) + if response.status_code == 200: + return response.json()["choices"][0]["message"]["content"] + else: + print(f" ⚠ DeepSeek API错误: {response.status_code}") + return None + except Exception as e: + print(f" ⚠ DeepSeek请求失败: {e}") + return None + + +def classify_abb_module(abb: str, project_name: str, api_key: str = None) -> str: + """ + 使用DeepSeek判断ABB项目属于哪个文字模块 + """ + # 首先尝试基于ABB和项目名的规则匹配 + abb_upper = abb.upper() + project_lower = project_name.lower() + + # 预定义的ABB到模块映射 + abb_module_map = { + # 尿检 + 'COLOR': 'Urine Detection', 'CLARITY': 'Urine Detection', 'SG': 'Urine Detection', + 'PH': 'Urine Detection', 'PRO': 'Urine Detection', 'GLU': 'Urine Detection', + 'KET': 'Urine Detection', 'NIT': 'Urine Detection', 'URO': 'Urine Detection', + 'BIL': 'Urine Detection', 'LEU': 'Urine Detection', 'ERY': 'Urine Detection', + 'BLD': 'Urine Detection', 'CRY': 'Urine Detection', 'BAC': 'Urine Detection', + # 血常规 + 'WBC': 'Complete Blood Count', 'RBC': 'Complete Blood Count', 'HB': 'Complete Blood Count', + 'HGB': 'Complete Blood Count', 'HCT': 'Complete Blood Count', 'MCV': 'Complete Blood Count', + 'MCH': 'Complete Blood Count', 'MCHC': 'Complete Blood Count', 'PLT': 'Complete Blood Count', + 'RDW': 'Complete Blood Count', 'RDW-SD': 'Complete Blood Count', 'RDW-CV': 'Complete Blood Count', + 'MPV': 'Complete Blood Count', 'PDW': 'Complete Blood Count', 'PCT': 'Complete Blood Count', + 'P-LCR': 'Complete Blood Count', + 'NEUT': 'Complete Blood Count', 'NEUT%': 'Complete Blood Count', + 'LYMPH': 'Complete Blood Count', 'LYMPH%': 'Complete Blood Count', + 'MONO': 'Complete Blood Count', 'MONO%': 'Complete Blood Count', + 'EOS': 'Complete Blood Count', 'EOS%': 'Complete Blood Count', + 'BAS': 'Complete Blood Count', 'BAS%': 'Complete Blood Count', + 'ESR': 'Complete Blood Count', + # 肝功能 + 'ALT': 'Liver Function', 'AST': 'Liver Function', 'GGT': 'Liver Function', + 'ALP': 'Liver Function', 'TBIL': 'Liver Function', 'DBIL': 'Liver Function', + 'IBIL': 'Liver Function', 'TP': 'Liver Function', 'ALB': 'Liver Function', + 'GLB': 'Liver Function', 'A/G': 'Liver Function', 'LDH': 'Liver Function', + 'CHE': 'Liver Function', 'TF': 'Liver Function', + # 肾功能 + 'BUN': 'Kidney Function', 'CREA': 'Kidney Function', 'CR': 'Kidney Function', + 'UA': 'Kidney Function', 'EGFR': 'Kidney Function', 'CYS-C': 'Kidney Function', + 'CYSC': 'Kidney Function', 'Β2-MG': 'Kidney Function', 'B2-MG': 'Kidney Function', + # 血脂 + 'TC': 'Lipid Panel', 'TG': 'Lipid Panel', 'HDL': 'Lipid Panel', 'LDL': 'Lipid Panel', + 'VLDL': 'Lipid Panel', 'APOA1': 'Lipid Panel', 'APOB': 'Lipid Panel', 'LP(A)': 'Lipid Panel', + 'FFA': 'Lipid Panel', + # 电解质 + 'NA': 'Electrolytes', 'K': 'Electrolytes', 'CL': 'Electrolytes', 'CA': 'Electrolytes', + 'P': 'Electrolytes', 'MG': 'Electrolytes', 'FE': 'Electrolytes', 'ZN': 'Electrolytes', + 'CU': 'Electrolytes', 'TCO2': 'Electrolytes', 'AG': 'Electrolytes', + # 糖代谢 + 'FPG': 'Glucose', 'FBS': 'Glucose', 'HBA1C': 'Glucose', 'OGTT': 'Glucose', 'INS': 'Glucose', + 'C-PEP': 'Glucose', 'EAG': 'Glucose', + # 甲状腺 + 'TSH': 'Thyroid', 'FT3': 'Thyroid', 'FT4': 'Thyroid', 'T3': 'Thyroid', 'T4': 'Thyroid', + 'TG-AB': 'Thyroid', 'TGAB': 'Thyroid', 'TPO-AB': 'Thyroid', + # 激素 + 'E2': 'Hormone', 'PROG': 'Hormone', 'TESTO': 'Hormone', 'FSH': 'Hormone', 'LH': 'Hormone', + 'PRL': 'Hormone', 'CORTISOL': 'Hormone', 'DHEA-S': 'Hormone', 'IGF-1': 'Hormone', + # 肿瘤标志物 + 'AFP': 'Tumor Markers', 'CEA': 'Tumor Markers', 'CA125': 'Tumor Markers', + 'CA153': 'Tumor Markers', 'CA199': 'Tumor Markers', 'PSA': 'Tumor Markers', + 'FPSA': 'Tumor Markers', 'TPSA': 'Tumor Markers', 'F/TPSA': 'Tumor Markers', + 'NSE': 'Tumor Markers', 'CYFRA21-1': 'Tumor Markers', + 'SCC': 'Tumor Markers', 'CA724': 'Tumor Markers', 'CA72-4': 'Tumor Markers', + 'CA19-9': 'Tumor Markers', 'CA24-2': 'Tumor Markers', 'CA50': 'Tumor Markers', + 'PROGRP': 'Tumor Markers', + # 凝血 + 'PT': 'Coagulation', 'APTT': 'Coagulation', 'TT': 'Coagulation', 'FIB': 'Coagulation', + 'D-DIMER': 'Coagulation', 'INR': 'Coagulation', 'FDP': 'Coagulation', + # 传染病 + 'HBSAG': 'Infectious Disease', 'HBSAB': 'Infectious Disease', 'HBEAG': 'Infectious Disease', + 'HBEAB': 'Infectious Disease', 'HBCAB': 'Infectious Disease', 'ANTI-HCV': 'Infectious Disease', + 'HIV': 'Infectious Disease', 'RPR': 'Infectious Disease', 'TPPA': 'Infectious Disease', + 'H.PYLORI': 'Infectious Disease', + # 免疫功能 + 'IGG': 'Immune Function', 'IGA': 'Immune Function', 'IGM': 'Immune Function', + 'IGE': 'Immune Function', 'C3': 'Immune Function', 'C4': 'Immune Function', + 'CRP': 'Immune Function', 'HS-CRP': 'Immune Function', 'RF': 'Immune Function', + 'ANA': 'Immune Function', 'ANTI-SM': 'Immune Function', 'ANTI-RNP': 'Immune Function', + 'ASO': 'Immune Function', 'NK': 'Immune Function', + # 骨代谢 + 'OSTE': 'Bone Metabolism', 'P1NP': 'Bone Metabolism', 'CTX': 'Bone Metabolism', + 'PTH': 'Bone Metabolism', '25-OH-VITD': 'Bone Metabolism', + '25-OH-VD2+D3': 'Bone Metabolism', 'VD3': 'Bone Metabolism', 'VD2': 'Bone Metabolism', + 'OST': 'Bone Metabolism', + # 重金属 + 'PB': 'Heavy Metals', 'MN': 'Heavy Metals', 'NI': 'Heavy Metals', + 'CR': 'Heavy Metals', 'CD': 'Heavy Metals', 'HG': 'Heavy Metals', + # 维生素 + 'VITB12': 'Vitamin', 'FOLATE': 'Vitamin', 'VITD': 'Vitamin', + 'VITA': 'Vitamin', 'VITE': 'Vitamin', 'VITK1': 'Vitamin', + 'VITB1': 'Vitamin', 'VITB2': 'Vitamin', 'VITB3': 'Vitamin', + 'VITB5': 'Vitamin', 'VITB6': 'Vitamin', + 'FER': 'Vitamin', # 铁蛋白(贫血相关) + # 同型半胱氨酸 + 'HCY': 'Homocysteine', + # 血型 + 'ABO': 'Blood Type', 'RH': 'Blood Type', + } + + # TG 歧义消解: 甲状腺球蛋白(Tg/Thyroid) vs 甘油三酯(TG/Lipid Panel) + if abb_upper == 'TG': + if '甲状腺' in project_lower or 'thyroglobulin' in project_lower: + return 'Thyroid' + # 其他情况默认为甘油三酯(Lipid Panel) + + # 尝试规则匹配 + if abb_upper in abb_module_map: + return abb_module_map[abb_upper] + + # 基于项目名关键词匹配(英文+中文) + keyword_module = { + # 尿液检测 + 'urine': 'Urine Detection', 'urinary': 'Urine Detection', + '尿液': 'Urine Detection', '尿检': 'Urine Detection', '酸碱度': 'Urine Detection', + '浊度': 'Urine Detection', '隐血': 'Urine Detection', '亚硝酸盐': 'Urine Detection', '酮体': 'Urine Detection', + # 血常规 + 'blood cell': 'Complete Blood Count', 'hemoglobin': 'Complete Blood Count', + 'platelet': 'Complete Blood Count', 'neutrophil': 'Complete Blood Count', + '中性粒细胞': 'Complete Blood Count', '淋巴细胞数量': 'Complete Blood Count', + '血红蛋白': 'Complete Blood Count', '血小板': 'Complete Blood Count', + '嗜酸': 'Complete Blood Count', '嗜碱': 'Complete Blood Count', '单核细胞': 'Complete Blood Count', + '红细胞': 'Complete Blood Count', '白细胞': 'Complete Blood Count', + # 肝功能 + 'liver': 'Liver Function', 'hepat': 'Liver Function', 'bilirubin': 'Liver Function', + '肝功能': 'Liver Function', '总蛋白': 'Liver Function', '白蛋白': 'Liver Function', + '球蛋白': 'Liver Function', '胆红素': 'Liver Function', '转氨酶': 'Liver Function', + '碱性磷酸酶': 'Liver Function', '谷氨酰': 'Liver Function', + # 肾功能 + 'kidney': 'Kidney Function', 'renal': 'Kidney Function', 'creatinine': 'Kidney Function', + '肾功能': 'Kidney Function', '肌酐': 'Kidney Function', '尿素氮': 'Kidney Function', '尿酸': 'Kidney Function', + # 血脂 + 'cholesterol': 'Lipid Panel', 'triglyceride': 'Lipid Panel', 'lipid': 'Lipid Panel', + '胆固醇': 'Lipid Panel', '甘油三酯': 'Lipid Panel', '脂蛋白': 'Lipid Panel', '血脂': 'Lipid Panel', + # 血糖 + 'glucose': 'Glucose', 'sugar': 'Glucose', 'hba1c': 'Glucose', 'insulin': 'Glucose', + '空腹血糖': 'Glucose', '糖化血红蛋白': 'Glucose', '血糖': 'Glucose', + # 甲状腺 + 'thyroid': 'Thyroid', 'tsh': 'Thyroid', + '甲状腺': 'Thyroid', '促甲状腺': 'Thyroid', + # 激素/荷尔蒙 + 'estrogen': 'Hormone', 'testosterone': 'Hormone', 'progesterone': 'Hormone', + 'cortisol': 'Hormone', 'hormone': 'Hormone', + '雌二醇': 'Hormone', '孕酮': 'Hormone', '睾酮': 'Hormone', '催乳素': 'Hormone', + '皮质醇': 'Hormone', '荷尔蒙': 'Hormone', '促卵泡': 'Hormone', '促黄体': 'Hormone', + '脱氢表雄酮': 'Hormone', '生长因子': 'Hormone', '抗缪勒': 'Hormone', + # 肿瘤标志物 + 'tumor': 'Tumor Markers', 'cancer': 'Tumor Markers', 'antigen': 'Tumor Markers', + '肿瘤': 'Tumor Markers', '甲胎蛋白': 'Tumor Markers', '癌胚抗原': 'Tumor Markers', + '铁蛋白': 'Tumor Markers', '糖类抗原': 'Tumor Markers', '前列腺': 'Tumor Markers', + '鳞状细胞': 'Tumor Markers', '降钙素': 'Tumor Markers', '烯醇化酶': 'Tumor Markers', + # 凝血 + 'coagul': 'Coagulation', 'thrombin': 'Coagulation', 'fibrin': 'Coagulation', + '凝血': 'Coagulation', '纤维蛋白原': 'Coagulation', + # 传染病 + 'hepatitis': 'Infectious Disease', 'hiv': 'Infectious Disease', 'syphilis': 'Infectious Disease', + '乙肝': 'Infectious Disease', '丙肝': 'Infectious Disease', '梅毒': 'Infectious Disease', + '传染病': 'Infectious Disease', '免疫缺陷病毒': 'Infectious Disease', + # 免疫功能 + 'immun': 'Immune Function', 'antibod': 'Immune Function', 'complement': 'Immune Function', + '红细胞沉降': 'Immune Function', '免疫球蛋白': 'Immune Function', '补体': 'Immune Function', + 'c反应蛋白': 'Immune Function', '抗链球菌': 'Immune Function', '抗核抗体': 'Immune Function', + '类风湿因子': 'Immune Function', '炎症': 'Immune Function', + # 骨代谢 + 'bone': 'Bone Metabolism', 'osteocalcin': 'Bone Metabolism', + '骨代谢': 'Bone Metabolism', '骨钙素': 'Bone Metabolism', '甲状旁腺': 'Bone Metabolism', + '维生素d': 'Bone Metabolism', '胶原': 'Bone Metabolism', + # 重金属/微量元素 + 'metal': 'Heavy Metals', 'lead': 'Heavy Metals', 'mercury': 'Heavy Metals', + '微量元素': 'Heavy Metals', '重金属': 'Heavy Metals', + # 维生素 + 'vitamin': 'Vitamin', 'folate': 'Vitamin', 'b12': 'Vitamin', + # 同型半胱氨酸 + 'homocysteine': 'Homocysteine', + '同型半胱氨酸': 'Homocysteine', + # 血型 + '血型': 'Blood Type', + # 心肌酶 + '肌酸激酶': 'Immune Function', '乳酸脱氢酶': 'Immune Function', + # 电解质 + '电解质': 'Electrolytes', '钾': 'Electrolytes', '钠': 'Electrolytes', '氯': 'Electrolytes', + '钙': 'Electrolytes', '镁': 'Electrolytes', '磷': 'Electrolytes', + # 胃功能 + '胃蛋白酶原': 'Immune Function', '胃泌素': 'Immune Function', + # 维生素 + '维生素': 'Vitamin', + # 影像学 + '影像': 'Other', '心电图': 'Other', 'b超': 'Other', + # 女性专项 + '妇科': 'Other', '女性专项': 'Other', + } + + # 按关键词长度降序匹配,确保长关键词优先(如 '糖化血红蛋白' 优先于 '血红蛋白') + for keyword, module in sorted(keyword_module.items(), key=lambda x: len(x[0]), reverse=True): + if keyword in project_lower: + return module + + # 如果规则匹配失败,检查缓存或调用DeepSeek API + cache = load_deepseek_cache() + cache_key = f"{abb}:{project_name}" + + # 检查缓存 + if cache_key in cache.get('classifications', {}): + return cache['classifications'][cache_key] + + if api_key: + prompt = f"""请判断以下医学检测项目属于哪个检测模块,只返回模块名称(英文): + +项目缩写: {abb} +项目名称: {project_name} + +可选模块: +- Urine Detection(尿液检测) +- Complete Blood Count(血常规) +- Liver Function(肝功能) +- Kidney Function(肾功能) +- Lipid Panel(血脂) +- Electrolytes(电解质) +- Glucose(糖代谢) +- Thyroid(甲状腺功能) +- Hormone(激素) +- Tumor Markers(肿瘤标志物) +- Coagulation(凝血功能) +- Infectious Disease(传染病) +- Immune Function(免疫功能) +- Bone Metabolism(骨代谢) +- Heavy Metals(重金属) +- Vitamin(维生素) +- Other(其他) + +只返回英文模块名称,不要其他内容。""" + + result = call_deepseek_api(prompt, api_key, max_tokens=50) + if result: + result = result.strip() + # 验证返回的模块名是否有效 + valid_modules = ['Urine Detection', 'Complete Blood Count', 'Liver Function', + 'Kidney Function', 'Lipid Panel', 'Electrolytes', 'Glucose', + 'Thyroid', 'Hormone', 'Tumor Markers', 'Coagulation', + 'Infectious Disease', 'Immune Function', 'Bone Metabolism', + 'Heavy Metals', 'Vitamin', 'Other'] + for vm in valid_modules: + if vm.lower() in result.lower(): + # 保存到缓存 + cache['classifications'][cache_key] = vm + save_deepseek_cache() + return vm + + return 'Other' + + +def get_ai_explanation(abb: str, project_name: str, result: str, api_key: str = None, gender: str = None) -> dict: + """ + 获取临床意义解释 + 优先级:1. 模板解释 -> 2. 缓存 -> 3. DeepSeek生成 -> 4. 通用模板 + + 参数: + abb: 项目缩写 + project_name: 项目名称 + result: 检测结果 + api_key: DeepSeek API密钥 + gender: 性别 ('male' 或 'female'),用于 COR/Cortisol 的临床意义选择 + """ + import json as json_module + from pathlib import Path + + # ABB别名映射:提取数据中的ABB -> 模板解释中的ABB + abb_aliases = { + 'WBC': 'WBC COUNT', + 'ABO': 'BLOOD TYPE', + 'Rh': 'BLOOD TYPE RH', + 'HCV': 'HCV-IGM', + 'Scr': 'SCR', + 'DBil': 'DBIL', + 'TBil': 'TBIL', + 'HbA1C': 'HBA1C', + 'Hcy': 'HCY', + 'Fer': 'FER', + 'TgAb': 'TGAB', + 'pH': 'PH', + 'β-CTX': 'Β-CTX', + 'Color': 'COLOR', + 'Clarity': 'TUR', + 'BIL': 'BIL', # 尿胆红素 + 'URO': 'URO', # 尿胆原 + 'ERY': 'BLD', # 尿红细胞/隐血 + 'IgA': 'IGA', + 'IgE': 'IGE', + 'IgG': 'IGG', + 'IgM': 'IGM', + 'Lp(a)': 'LP(A)', + 'hs-CRP': 'hs-CRP', + # 电解质和微量元素(大小写映射) + 'Cl': 'CL', + 'Na': 'NA', + 'Mg': 'MG', + 'Ca': 'CA', + 'K': 'K', + 'P': 'P', + # 重金属(大小写映射) + 'Pb': 'PB', + 'Cr': 'CR', + 'Hg': 'HG', + 'Cd': 'CD', + 'Mn': 'MN', + 'Ni': 'NI', + 'Zn': 'ZN', + 'Cu': 'CU', + 'Fe': 'FE', + # 其他 + 'CIB': 'CIB', + } + + # 特殊处理 COR/Cortisol:根据性别选择正确的临床意义 + lookup_abb = abb + abb_upper = abb.upper().strip() + if abb_upper in ['COR', 'CORTISOL']: + if gender == 'male': + lookup_abb = 'CORTISOL' # 男性使用 CORTISOL 的临床意义 + else: + lookup_abb = 'COR' # 女性使用 COR 的临床意义 + + # 应用别名映射 + if lookup_abb in abb_aliases: + lookup_abb = abb_aliases[lookup_abb] + elif lookup_abb.upper() in abb_aliases: + lookup_abb = abb_aliases[lookup_abb.upper()] + + # 1. 首先尝试从模板解释文件获取 + template_explanations_file = Path(__file__).parent / "template_explanations.json" + if template_explanations_file.exists(): + try: + with open(template_explanations_file, 'r', encoding='utf-8') as f: + template_explanations = json_module.load(f) + + # 先尝试精确匹配(处理大小写敏感的ABB如TG/Tg) + abb_stripped = lookup_abb.strip() + if abb_stripped in template_explanations: + exp = template_explanations[abb_stripped] + if exp.get('clinical_en') and exp.get('clinical_cn'): + return {'en': exp['clinical_en'], 'cn': exp['clinical_cn']} + + # 再尝试大写匹配 + abb_upper_lookup = lookup_abb.upper().strip() + if abb_upper_lookup in template_explanations: + exp = template_explanations[abb_upper_lookup] + if exp.get('clinical_en') and exp.get('clinical_cn'): + return {'en': exp['clinical_en'], 'cn': exp['clinical_cn']} + + # 去除特殊字符后匹配 + abb_clean = ''.join(c for c in abb_upper_lookup if c.isalnum()) + for key, value in template_explanations.items(): + key_clean = ''.join(c for c in key.upper() if c.isalnum()) + if abb_clean == key_clean: + if value.get('clinical_en') and value.get('clinical_cn'): + return {'en': value['clinical_en'], 'cn': value['clinical_cn']} + + # 尝试原始ABB(未经别名转换) + if abb.strip() in template_explanations: + exp = template_explanations[abb.strip()] + if exp.get('clinical_en') and exp.get('clinical_cn'): + return {'en': exp['clinical_en'], 'cn': exp['clinical_cn']} + if abb.upper().strip() in template_explanations: + exp = template_explanations[abb.upper().strip()] + if exp.get('clinical_en') and exp.get('clinical_cn'): + return {'en': exp['clinical_en'], 'cn': exp['clinical_cn']} + + except Exception as e: + pass # 静默失败,继续尝试其他方式 + + # 2. 检查缓存 + cache = load_deepseek_cache() + cache_key = f"{abb}:{project_name}" + + if cache_key in cache.get('explanations', {}): + return cache['explanations'][cache_key] + + # 3. 如果有API密钥,调用DeepSeek + if api_key: + prompt = f"""请为以下医学检测项目生成临床意义说明,分别用英文和中文各一段(每段50-80字)。 + +严格要求: +1. 只描述该检测项目是什么、测量什么、在医学上的意义 +2. 禁止分析具体检测结果或数值 +3. 禁止给出诊断建议、健康建议或治疗建议 +4. 禁止使用"如果升高/降低则..."、"异常时..."等条件分析语句 +5. 禁止使用"可能"、"也许"、"建议"等词汇 +6. 使用客观、专业的医学术语,陈述事实 + +正确示例: +- "白细胞计数反映机体免疫系统状态,是评估感染和炎症的重要指标。" +- "血红蛋白是红细胞中携带氧气的蛋白质,反映血液的携氧能力。" + +错误示例(禁止): +- "白细胞升高可能提示感染..."(禁止分析结果) +- "建议定期复查..."(禁止给建议) + +项目缩写: {abb} +项目名称: {project_name} + +请严格按照以下JSON格式返回,不要其他内容: +{{"en": "英文临床意义说明", "cn": "中文临床意义说明"}}""" + + response = call_deepseek_api(prompt, api_key, max_tokens=500) + if response: + try: + # 尝试解析JSON + # 清理可能的markdown标记 + clean_response = response.strip() + if '```json' in clean_response: + clean_response = clean_response.split('```json')[1].split('```')[0] + elif '```' in clean_response: + clean_response = clean_response.split('```')[1].split('```')[0] + + data = json_module.loads(clean_response.strip()) + if 'en' in data and 'cn' in data: + # 保存到缓存 + cache['explanations'][cache_key] = data + save_deepseek_cache() + return data + except: + pass + + # 4. 降级:使用预定义模板 + templates = { + 'WBC': {'en': 'White blood cell count reflects immune system status and is an important indicator for evaluating infection and inflammation.', + 'cn': '白细胞计数反映机体免疫系统状态,是评估感染和炎症的重要指标。'}, + 'RBC': {'en': 'Red blood cell count reflects the oxygen-carrying capacity of blood and is used to evaluate anemia status.', + 'cn': '红细胞计数反映血液的携氧能力,用于评估贫血状况。'}, + 'HB': {'en': 'Hemoglobin is the oxygen-carrying protein in red blood cells, reflecting the oxygen transport capacity of blood.', + 'cn': '血红蛋白是红细胞中携带氧气的蛋白质,反映血液的携氧能力。'}, + 'PLT': {'en': 'Platelet count reflects the blood clotting function and hemostatic capacity.', + 'cn': '血小板计数反映血液的凝血功能和止血能力。'}, + 'ALT': {'en': 'Alanine aminotransferase (ALT) is an enzyme primarily found in liver cells, reflecting liver cell integrity.', + 'cn': '谷丙转氨酶(ALT)主要存在于肝细胞中,反映肝细胞的完整性。'}, + 'AST': {'en': 'Aspartate aminotransferase (AST) is an enzyme found in liver and heart muscle cells, reflecting tissue integrity.', + 'cn': '谷草转氨酶(AST)存在于肝脏和心肌细胞中,反映组织的完整性。'}, + 'TC': {'en': 'Total cholesterol is a lipid component in blood, important for cardiovascular health assessment.', + 'cn': '总胆固醇是血液中的脂质成分,对心血管健康评估具有重要意义。'}, + 'TG': {'en': 'Triglycerides are the main form of fat storage in the body, reflecting lipid metabolism status.', + 'cn': '甘油三酯是体内脂肪储存的主要形式,反映脂质代谢状况。'}, + 'GLU': {'en': 'Blood glucose is the primary energy source for cells, essential for diabetes screening and metabolic assessment.', + 'cn': '血糖是细胞的主要能量来源,是糖尿病筛查和代谢评估的重要指标。'}, + 'TSH': {'en': 'TSH level reflects thyroid function and helps diagnose thyroid disorders.', + 'cn': 'TSH水平反映甲状腺功能,有助于诊断甲状腺疾病。'}, + } + + if abb.upper() in templates: + return templates[abb.upper()] + + # 通用模板 + return { + "en": f"{project_name} ({abb}) is a medical test indicator used for health assessment and disease screening.", + "cn": f"{project_name}({abb})是一项医学检测指标,用于健康评估和疾病筛查。" + } + +def find_module_end_position(doc, module_name): + """ + 找到指定模块的最后一个表格位置 + 通过查找模块标题行来精确定位 + 返回该模块最后一个表格在doc.element.body中的索引 + """ + # 模块标题的精确匹配(必须是标题行,不是普通数据) + module_titles = { + 'Urine Detection': ['urine detection', '尿液检测'], + 'Complete Blood Count': ['complete blood count', '血常规'], + 'Heavy Metals': ['heavy metal', '重金属', 'trace element', '微量元素', 'microelement'], + 'Infectious Disease': ['infectious disease', '传染病', 'hepatitis', '肝炎'], + 'Kidney Function': ['kidney function', '肾功能'], + 'Liver Function': ['liver function', '肝功能'], + 'Lipid Panel': ['lipid panel', '血脂'], + 'Thyroid': ['thyroid function', '甲状腺功能'], + 'Hormone': ['hormone', '激素', 'female hormone', 'male hormone'], + 'Tumor Markers': ['tumor marker', '肿瘤标志物'], + 'Electrolytes': ['electrolyte', '电解质'], + 'Glucose': ['glucose metabolism', '糖代谢'], + 'Coagulation': ['coagulation', '凝血'], + 'Immune Function': ['immune function', '免疫功能', 'humoral immunity', '体液免疫'], + 'Bone Metabolism': ['bone metabolism', '骨代谢'], + } + + titles = module_titles.get(module_name, [module_name.lower()]) + body = doc.element.body + + # 第一步:找到模块标题表格的索引 + module_start_table_idx = -1 + for i, table in enumerate(doc.tables): + # 检查第一行或第二行是否包含模块标题 + for row_idx in range(min(2, len(table.rows))): + row_text = ' '.join([c.text.lower().strip() for c in table.rows[row_idx].cells]) + # 标题行通常在整行都是相同的文字(合并单元格) + if any(title in row_text for title in titles): + module_start_table_idx = i + break + if module_start_table_idx >= 0: + break + + if module_start_table_idx < 0: + return -1 + + # 第二步:找到下一个模块的起始位置(或文档末尾) + next_module_table_idx = len(doc.tables) + all_titles = [] + for t_list in module_titles.values(): + all_titles.extend(t_list) + + for i in range(module_start_table_idx + 1, len(doc.tables)): + table = doc.tables[i] + for row_idx in range(min(2, len(table.rows))): + row_text = ' '.join([c.text.lower().strip() for c in table.rows[row_idx].cells]) + # 检查是否是另一个模块的标题 + if any(title in row_text and title not in titles for title in all_titles): + next_module_table_idx = i + break + if next_module_table_idx < len(doc.tables): + break + + # 第三步:找到该模块范围内最后一个表格在body中的位置 + last_table_in_module = next_module_table_idx - 1 + if last_table_in_module < module_start_table_idx: + last_table_in_module = module_start_table_idx + + # 获取body中的位置 + tbl_element = doc.tables[last_table_in_module]._tbl + for idx, child in enumerate(body): + if child is tbl_element: + return idx + + return -1 + +def insert_table_after_position(doc, position, abb, project_name, result, clinical_en, clinical_cn, + point='', reference='', unit='', include_header=False): + """ + 在指定位置后插入新表格(完全复刻模板样式) + 格式(无表头时): + Row 0: ABB | Name | Result | Point | Refer | Unit - 数据行 + Row 1: Clinical Significance (Merged) - 解释行 + + 格式(有表头时): + Row 0: Header - Abb简称 | Project项目 | Result结果 | Point指示 | Refer参考 | Unit单位 + Row 1: ABB | Name | Result | Point | Refer | Unit - 数据行 + Row 2: Clinical Significance (Merged) - 解释行 + """ + from lxml import etree + + # 清理参考范围格式 + reference = clean_reference_range(reference) + + # 根据是否需要表头决定行数 + num_rows = 3 if include_header else 2 + table = doc.add_table(rows=num_rows, cols=6) + table.alignment = WD_TABLE_ALIGNMENT.CENTER + table.autofit = False + + # 设置列宽 + widths = [Cm(2.5), Cm(3.5), Cm(2.5), Cm(2.5), Cm(2.5), Cm(2.5)] + for row in table.rows: + for idx, width in enumerate(widths): + row.cells[idx].width = width + + # 定义字体样式函数 + def set_font(run, bold=False, font_size=10.5): + run.bold = bold + run.font.name = 'Times New Roman' + run.font.size = Pt(font_size) + run._element.rPr.rFonts.set(qn('w:eastAsia'), '宋体') + + # 定义临床意义字体样式函数(华文楷体,11号字) + def set_clinical_font(run, bold=False): + run.bold = bold + run.font.name = '华文楷体' + run.font.size = Pt(11) + run._element.rPr.rFonts.set(qn('w:eastAsia'), '华文楷体') + + # 确定数据行和解释行的索引 + if include_header: + # 有表头:Row 0=表头, Row 1=数据, Row 2=解释 + header_row_idx = 0 + data_row_idx = 1 + sig_row_idx = 2 + + # === 表头行 === + row0 = table.rows[header_row_idx] + headers = [ + ('Abb', '简称'), ('Project', '项目'), ('Result', '结果'), + ('Point', '提示'), ('Refer', '参考'), ('Unit', '单位') + ] + for idx, (en, cn) in enumerate(headers): + p = row0.cells[idx].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + run = p.add_run(f'{en}\n{cn}') + set_font(run, bold=True, font_size=9) + else: + # 无表头:Row 0=数据, Row 1=解释 + data_row_idx = 0 + sig_row_idx = 1 + + # === 数据行 === + data_row = table.rows[data_row_idx] + + # 1. ABB + p = data_row.cells[0].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + run = p.add_run(abb) + set_font(run, bold=True) + + # 2. 项目名 + p = data_row.cells[1].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + run = p.add_run(project_name) + set_font(run, bold=True) + + # 3. 结果 + p = data_row.cells[2].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + run = p.add_run(str(result)) + set_font(run) + + # 4. Point列 + p = data_row.cells[3].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + if point: + run = p.add_run(point) + set_font(run) + + # 5. Refer列 + p = data_row.cells[4].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + if reference: + run = p.add_run(reference) + set_font(run) + + # 6. Unit列 + p = data_row.cells[5].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + if unit: + run = p.add_run(unit) + set_font(run) + + # === 临床意义行 === + sig_row = table.rows[sig_row_idx] + top_cell = sig_row.cells[0] + for i in range(1, 6): + top_cell.merge(sig_row.cells[i]) + + # 第一个段落:英文临床意义 + p = top_cell.paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.LEFT + run = p.add_run('Clinical Significance: ') + set_clinical_font(run, bold=True) + run = p.add_run(clinical_en) + set_clinical_font(run) + + # 第二个段落:中文临床意义(独立段落,与案例文件格式一致) + p_cn = top_cell.add_paragraph() + p_cn.alignment = WD_ALIGN_PARAGRAPH.LEFT + run = p_cn.add_run('临床意义:') + set_clinical_font(run, bold=True) + run = p_cn.add_run(clinical_cn) + set_clinical_font(run) + + # === 设置边框 === + # 顶部实线 (黑色) + border_solid = {'val': 'single', 'sz': 4, 'color': '000000', 'space': 0} + # 其他虚线 (灰色) + border_dashed = {'val': 'dashed', 'sz': 4, 'color': 'AAAAAA', 'space': 0} + + for i, row in enumerate(table.rows): + for cell in row.cells: + # 默认四周都是虚线 + top = border_dashed + bottom = border_dashed + left = border_dashed + right = border_dashed + + # 第一行顶部设置为实线 + if i == 0: + top = border_solid + + # 应用边框 + set_cell_border(cell, top=top, bottom=bottom, left=left, right=right) + + # 垂直居中 + cell.vertical_alignment = 1 + + # 移动表格到指定位置 + if position >= 0: + body = doc.element.body + tbl_element = table._tbl + # 从当前位置移除 + body.remove(tbl_element) + # 插入到指定位置后 + body.insert(position + 1, tbl_element) + + # 添加分隔段落(表格后空一行) + if position >= 0: + from docx.oxml import OxmlElement + empty_p = OxmlElement('w:p') + body.insert(position + 2, empty_p) + + return table + + +def insert_paired_items_table(doc, position, + abb, name_cn, result, clinical_en, clinical_cn, + point='', reference='', unit='', + include_header=False): + """ + 在指定位置后插入配对项目表格(两行数据,共享临床意义) + 例如:EOS和EOS%显示在同一个表格中 + + 格式(无表头时): + Row 0: ABB | Name_CN (基础项) | Result | Point | Reference | Unit + Row 1: ABB% | Name_CN (百分比项) | (空) | (空) | (空) | (空) + Row 2: Clinical Significance (Merged) - 解释行 + + 格式(有表头时): + Row 0: Header + Row 1: ABB | Name_CN (基础项) | Result | Point | Reference | Unit + Row 2: ABB% | Name_CN (百分比项) | (空) | (空) | (空) | (空) + Row 3: Clinical Significance (Merged) - 解释行 + + 注意:数据只填入第一行(基础项或百分比项,取决于传入的是哪个),第二行只显示ABB和名称 + """ + from lxml import etree + + # 获取配对信息 + abb_upper = abb.upper().strip() + paired_abb, is_base, base_cn, percent_cn = get_paired_item(abb) + + if not paired_abb: + # 不是配对项目,使用普通表格 + return insert_table_after_position(doc, position, abb, name_cn, result, + clinical_en, clinical_cn, + point=point, reference=reference, unit=unit, + include_header=include_header) + + # 确定基础项和百分比项的ABB和名称 + # 数据填入传入的那一行 + if is_base: + abb1 = abb_upper + abb2 = paired_abb + name1 = base_cn + name2 = percent_cn + # 数据在第一行 + result1, point1, reference1, unit1 = result, point, reference, unit + result2, point2, reference2, unit2 = '', '', '', '' + else: + abb1 = paired_abb + abb2 = abb_upper + name1 = base_cn + name2 = percent_cn + # 数据在第二行 + result1, point1, reference1, unit1 = '', '', '', '' + result2, point2, reference2, unit2 = result, point, reference, unit + + # 根据是否需要表头决定行数 + num_rows = 4 if include_header else 3 + table = doc.add_table(rows=num_rows, cols=6) + table.alignment = WD_TABLE_ALIGNMENT.CENTER + table.autofit = False + + # 设置列宽 + widths = [Cm(2.5), Cm(3.5), Cm(2.5), Cm(2.5), Cm(2.5), Cm(2.5)] + for row in table.rows: + for idx, width in enumerate(widths): + row.cells[idx].width = width + + # 定义字体样式函数 + def set_font(run, bold=False, font_size=10.5): + run.bold = bold + run.font.name = 'Times New Roman' + run.font.size = Pt(font_size) + run._element.rPr.rFonts.set(qn('w:eastAsia'), '宋体') + + # 定义临床意义字体样式函数(华文楷体,11号字) + def set_clinical_font(run, bold=False): + run.bold = bold + run.font.name = '华文楷体' + run.font.size = Pt(11) + run._element.rPr.rFonts.set(qn('w:eastAsia'), '华文楷体') + + # 确定行索引 + if include_header: + header_row_idx = 0 + data_row1_idx = 1 + data_row2_idx = 2 + sig_row_idx = 3 + + # === 表头行 === + row0 = table.rows[header_row_idx] + headers = [ + ('Abb', '简称'), ('Project', '项目'), ('Result', '结果'), + ('Point', '提示'), ('Refer', '参考'), ('Unit', '单位') + ] + for idx, (en, cn) in enumerate(headers): + p = row0.cells[idx].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + run = p.add_run(f'{en}\n{cn}') + set_font(run, bold=True, font_size=9) + else: + data_row1_idx = 0 + data_row2_idx = 1 + sig_row_idx = 2 + + # === 数据行1 (基础项,如EOS) === + data_row1 = table.rows[data_row1_idx] + + # 1. ABB1 + p = data_row1.cells[0].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + run = p.add_run(abb1) + set_font(run, bold=True) + + # 2. 项目名1 (中文名) + p = data_row1.cells[1].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + run = p.add_run(name1) + set_font(run, bold=True) + + # 3. Result1 + p = data_row1.cells[2].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + if result1: + run = p.add_run(str(result1)) + set_font(run) + + # 4. Point1 + p = data_row1.cells[3].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + if point1: + run = p.add_run(str(point1)) + set_font(run) + + # 5. Reference1 + p = data_row1.cells[4].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + if reference1: + run = p.add_run(str(reference1)) + set_font(run) + + # 6. Unit1 + p = data_row1.cells[5].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + if unit1: + run = p.add_run(str(unit1)) + set_font(run) + + # === 数据行2 (百分比项,如EOS%) === + data_row2 = table.rows[data_row2_idx] + + # 1. ABB2 + p = data_row2.cells[0].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + run = p.add_run(abb2) + set_font(run, bold=True) + + # 2. 项目名2 (中文名) + p = data_row2.cells[1].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + run = p.add_run(name2) + set_font(run, bold=True) + + # 3. Result2 + p = data_row2.cells[2].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + if result2: + run = p.add_run(str(result2)) + set_font(run) + + # 4. Point2 + p = data_row2.cells[3].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + if point2: + run = p.add_run(str(point2)) + set_font(run) + + # 5. Reference2 + p = data_row2.cells[4].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + if reference2: + run = p.add_run(str(reference2)) + set_font(run) + + # 6. Unit2 + p = data_row2.cells[5].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + if unit2: + run = p.add_run(str(unit2)) + set_font(run) + + # === 临床意义行 === + sig_row = table.rows[sig_row_idx] + top_cell = sig_row.cells[0] + for i in range(1, 6): + top_cell.merge(sig_row.cells[i]) + + # 第一个段落:英文临床意义 + p = top_cell.paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.LEFT + run = p.add_run('Clinical Significance: ') + set_clinical_font(run, bold=True) + run = p.add_run(clinical_en) + set_clinical_font(run) + + # 第二个段落:中文临床意义(独立段落,与案例文件格式一致) + p_cn = top_cell.add_paragraph() + p_cn.alignment = WD_ALIGN_PARAGRAPH.LEFT + run = p_cn.add_run('临床意义:') + set_clinical_font(run, bold=True) + run = p_cn.add_run(clinical_cn) + set_clinical_font(run) + + # === 设置边框 === + border_solid = {'val': 'single', 'sz': 4, 'color': '000000', 'space': 0} + border_dashed = {'val': 'dashed', 'sz': 4, 'color': 'AAAAAA', 'space': 0} + + for i, row in enumerate(table.rows): + for cell in row.cells: + top = border_dashed + bottom = border_dashed + left = border_dashed + right = border_dashed + + if i == 0: + top = border_solid + + set_cell_border(cell, top=top, bottom=bottom, left=left, right=right) + cell.vertical_alignment = 1 + + # 移动表格到指定位置 + if position >= 0: + body = doc.element.body + tbl_element = table._tbl + body.remove(tbl_element) + body.insert(position + 1, tbl_element) + + # 添加分隔段落 + if position >= 0: + from docx.oxml import OxmlElement + empty_p = OxmlElement('w:p') + body.insert(position + 2, empty_p) + + return table + + +def insert_paired_items_table_with_both_data(doc, position, + base_abb, percent_abb, + base_cn, percent_cn, + base_result, base_point, base_reference, base_unit, + percent_result, percent_point, percent_reference, percent_unit, + clinical_en, clinical_cn, + include_header=False): + """ + 插入配对项目表格,两行数据都填入 + Row 0 (可选): 表头 + Row 1: 基础项 ABB | 中文名 | Result | Point | Reference | Unit + Row 2: 百分比项 ABB | 中文名 | Result | Point | Reference | Unit + Row 3: Clinical Significance (合并单元格) + """ + from lxml import etree + + # 清理参考范围格式 + base_reference = clean_reference_range(base_reference) + percent_reference = clean_reference_range(percent_reference) + + # 根据是否需要表头决定行数 + num_rows = 4 if include_header else 3 + table = doc.add_table(rows=num_rows, cols=6) + table.alignment = WD_TABLE_ALIGNMENT.CENTER + table.autofit = False + + # 设置列宽 + widths = [Cm(2.5), Cm(3.5), Cm(2.5), Cm(2.5), Cm(2.5), Cm(2.5)] + for row in table.rows: + for idx, width in enumerate(widths): + row.cells[idx].width = width + + # 定义字体样式函数 + def set_font(run, bold=False, font_size=10.5): + run.bold = bold + run.font.name = 'Times New Roman' + run.font.size = Pt(font_size) + run._element.rPr.rFonts.set(qn('w:eastAsia'), '宋体') + + # 定义临床意义字体样式函数(华文楷体,11号字) + def set_clinical_font(run, bold=False): + run.bold = bold + run.font.name = '华文楷体' + run.font.size = Pt(11) + run._element.rPr.rFonts.set(qn('w:eastAsia'), '华文楷体') + + # 确定行索引 + if include_header: + header_row_idx = 0 + data_row1_idx = 1 + data_row2_idx = 2 + sig_row_idx = 3 + + # === 表头行 === + row0 = table.rows[header_row_idx] + headers = [ + ('Abb', '简称'), ('Project', '项目'), ('Result', '结果'), + ('Point', '提示'), ('Refer', '参考'), ('Unit', '单位') + ] + for idx, (en, cn) in enumerate(headers): + p = row0.cells[idx].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + run = p.add_run(f'{en}\n{cn}') + set_font(run, bold=True, font_size=9) + else: + data_row1_idx = 0 + data_row2_idx = 1 + sig_row_idx = 2 + + # === 数据行1 (基础项) === + data_row1 = table.rows[data_row1_idx] + + # 1. ABB + p = data_row1.cells[0].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + run = p.add_run(base_abb) + set_font(run, bold=True) + + # 2. 项目名 (中文名) + p = data_row1.cells[1].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + run = p.add_run(base_cn) + set_font(run, bold=True) + + # 3. Result + p = data_row1.cells[2].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + if base_result: + run = p.add_run(str(base_result)) + set_font(run) + + # 4. Point + p = data_row1.cells[3].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + if base_point: + run = p.add_run(str(base_point)) + set_font(run) + + # 5. Reference + p = data_row1.cells[4].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + if base_reference: + run = p.add_run(str(base_reference)) + set_font(run) + + # 6. Unit + p = data_row1.cells[5].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + if base_unit: + run = p.add_run(str(base_unit)) + set_font(run) + + # === 数据行2 (百分比项) === + data_row2 = table.rows[data_row2_idx] + + # 1. ABB + p = data_row2.cells[0].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + run = p.add_run(percent_abb) + set_font(run, bold=True) + + # 2. 项目名 (中文名) + p = data_row2.cells[1].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + run = p.add_run(percent_cn) + set_font(run, bold=True) + + # 3. Result + p = data_row2.cells[2].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + if percent_result: + run = p.add_run(str(percent_result)) + set_font(run) + + # 4. Point + p = data_row2.cells[3].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + if percent_point: + run = p.add_run(str(percent_point)) + set_font(run) + + # 5. Reference + p = data_row2.cells[4].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + if percent_reference: + run = p.add_run(str(percent_reference)) + set_font(run) + + # 6. Unit + p = data_row2.cells[5].paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.CENTER + if percent_unit: + run = p.add_run(str(percent_unit)) + set_font(run) + + # === 临床意义行 === + sig_row = table.rows[sig_row_idx] + top_cell = sig_row.cells[0] + for i in range(1, 6): + top_cell.merge(sig_row.cells[i]) + + # 第一个段落:英文临床意义 + p = top_cell.paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.LEFT + run = p.add_run('Clinical Significance: ') + set_clinical_font(run, bold=True) + run = p.add_run(clinical_en) + set_clinical_font(run) + + # 第二个段落:中文临床意义(独立段落,与案例文件格式一致) + p_cn = top_cell.add_paragraph() + p_cn.alignment = WD_ALIGN_PARAGRAPH.LEFT + run = p_cn.add_run('临床意义:') + set_clinical_font(run, bold=True) + run = p_cn.add_run(clinical_cn) + set_clinical_font(run) + + # === 设置边框 === + border_solid = {'val': 'single', 'sz': 4, 'color': '000000', 'space': 0} + border_dashed = {'val': 'dashed', 'sz': 4, 'color': 'AAAAAA', 'space': 0} + + for i, row in enumerate(table.rows): + for cell in row.cells: + top = border_dashed + bottom = border_dashed + left = border_dashed + right = border_dashed + + if i == 0: + top = border_solid + + set_cell_border(cell, top=top, bottom=bottom, left=left, right=right) + cell.vertical_alignment = 1 + + # 移动表格到指定位置 + if position >= 0: + body = doc.element.body + tbl_element = table._tbl + body.remove(tbl_element) + body.insert(position + 1, tbl_element) + + # 添加分隔段落 + if position >= 0: + from docx.oxml import OxmlElement + empty_p = OxmlElement('w:p') + body.insert(position + 2, empty_p) + + return table + + +def add_missing_items_table(doc, unfilled_abbs, matched_data, api_key=None): + """ + 添加缺失项目到对应模块尾部 + 流程: + 1. 先用DeepSeek分析所有缺失项目属于哪个模块 + 2. 按标准模块顺序处理,在对应模块尾部添加表格 + 3. 然后调用DeepSeek生成Clinical Significance解释 + """ + if not unfilled_abbs: + print("\n ✓ 没有缺失项目需要添加") + return + + # 加载配置获取模块信息和标准顺序 + from config import load_abb_config, get_standard_module_order, sort_items_by_standard_order, normalize_abb, normalize_module_name + abb_config = load_abb_config() + abb_to_module = abb_config.get('abb_to_module', {}) + abb_to_info = abb_config.get('abb_to_info', {}) + standard_module_order = get_standard_module_order() + + print(f"\n 📋 开始处理 {len(unfilled_abbs)} 个缺失项目...") + + # ===== 第一步:使用DeepSeek分析所有缺失项目属于哪个模块 ===== + print("\n 🔍 步骤1: 分析缺失项目所属模块...") + + by_module = {} # {module: [(abb, data), ...]} + items_to_classify = [] # 需要调用DeepSeek分类的项目 + + for abb in unfilled_abbs: + data = matched_data.get(abb, {}) + result = data.get('result', '') + if not result: + continue + + project_name = data.get('project', abb) + + # 标准化ABB名称 + normalized_abb = normalize_abb(abb, abb_config) + + # 优先使用配置中的模块(先精确匹配,再大写匹配) + module = abb_to_module.get(normalized_abb, '') + if not module: + module = abb_to_module.get(abb, '') + if not module: + module = abb_to_module.get(normalized_abb.upper(), '') + if not module: + module = abb_to_module.get(abb.upper(), '') + + if module: + if module not in by_module: + by_module[module] = [] + by_module[module].append((abb, data)) + print(f" ✓ {abb} → [{module}] (配置文件)") + else: + # 需要DeepSeek分类 + items_to_classify.append((abb, data, project_name)) + + # 批量调用DeepSeek分类 + if items_to_classify: + print(f"\n 🤖 调用DeepSeek分类 {len(items_to_classify)} 个未知项目...") + for abb, data, project_name in items_to_classify: + module = classify_abb_module(abb, project_name, api_key) + # 标准化模块名称 + original_module = module + module = normalize_module_name(module, abb_config) + if original_module != module: + print(f" ✓ {abb} → [{original_module}] → [{module}] (DeepSeek)") + else: + print(f" ✓ {abb} → [{module}] (DeepSeek)") + if module not in by_module: + by_module[module] = [] + by_module[module].append((abb, data)) + + # 打印分组结果 + print(f"\n 📊 分组结果:") + for module in standard_module_order: + if module in by_module: + items = by_module[module] + print(f" [{module}]: {len(items)} 个项目 - {[i[0] for i in items]}") + # 打印不在标准顺序中的模块 + for module, items in by_module.items(): + if module not in standard_module_order: + print(f" [{module}] (额外): {len(items)} 个项目 - {[i[0] for i in items]}") + + # ===== 第二步:按标准模块顺序添加表格 ===== + print(f"\n 📝 步骤2: 按标准顺序在对应模块尾部添加表格...") + + # 找到每个模块的标题位置 + module_positions = {} + skipped_modules = [] + for module in by_module.keys(): + pos = find_module_title_position(doc, module) + if pos < 0: + skipped_modules.append(module) + print(f" ⚠️ 模块 [{module}] 找不到标题位置,将跳过") + else: + module_positions[module] = pos + print(f" 📍 模块 [{module}] 标题位置: {pos}") + + # 为每个模块的每个ABB创建表格 + added_items = [] + added_count = 0 + + # 按标准顺序处理模块 + for module in standard_module_order: + if module not in by_module or module in skipped_modules: + continue + + items = by_module[module] + position = module_positions.get(module, -1) + if position < 0: + continue + + # 按标准项目顺序排序 + sorted_items = sort_items_by_standard_order(items, module, abb_config) + + print(f"\n 📁 处理模块 [{module}] ({len(sorted_items)} 个项目)...") + + insert_pos = position + for abb, data in sorted_items: + result = data.get('result', '') + point = data.get('point', '') + reference = data.get('reference', '') + unit = data.get('unit', '') + + normalized_abb = normalize_abb(abb, abb_config) + info = abb_to_info.get(normalized_abb, {}) + if not info: + info = abb_to_info.get(abb, {}) + if not info: + info = abb_to_info.get(normalized_abb.upper(), {}) + if not info: + info = abb_to_info.get(abb.upper(), {}) + # 优先使用配置文件中的中文名称,其次使用data中的project_cn + name = info.get('project_cn') or data.get('project_cn') + # 如果没有中文名称,调用DeepSeek翻译 + if not name: + english_name = info.get('project') or data.get('project', abb) + name = translate_project_name_to_chinese(abb, english_name, api_key) + + # 先用占位符创建表格 + placeholder_en = "[Generating clinical significance...]" + placeholder_cn = "[正在生成临床意义...]" + + try: + insert_table_after_position( + doc, insert_pos, abb, name, result, + placeholder_en, placeholder_cn, + point=point, reference=reference, unit=unit, + include_header=False + ) + print(f" ✓ 添加表格: {abb} ({name}) = {result}") + added_items.append((abb, name, result)) + added_count += 1 + insert_pos += 2 + except Exception as e: + print(f" ✗ 添加 {abb} 失败: {e}") + + # 处理不在标准顺序中的模块 + for module, items in by_module.items(): + if module in standard_module_order or module in skipped_modules: + continue + + position = module_positions.get(module, -1) + if position < 0: + continue + + sorted_items = sort_items_by_standard_order(items, module, abb_config) + + print(f"\n 📁 处理额外模块 [{module}] ({len(sorted_items)} 个项目)...") + + insert_pos = position + for abb, data in sorted_items: + result = data.get('result', '') + point = data.get('point', '') + reference = data.get('reference', '') + unit = data.get('unit', '') + + normalized_abb = normalize_abb(abb, abb_config) + info = abb_to_info.get(normalized_abb, {}) + if not info: + info = abb_to_info.get(abb, {}) + if not info: + info = abb_to_info.get(normalized_abb.upper(), {}) + if not info: + info = abb_to_info.get(abb.upper(), {}) + # 优先使用配置文件中的中文名称,其次使用data中的project_cn + name = info.get('project_cn') or data.get('project_cn') + # 如果没有中文名称,调用DeepSeek翻译 + if not name: + english_name = info.get('project') or data.get('project', abb) + name = translate_project_name_to_chinese(abb, english_name, api_key) + + placeholder_en = "[Generating clinical significance...]" + placeholder_cn = "[正在生成临床意义...]" + + try: + insert_table_after_position( + doc, insert_pos, abb, name, result, + placeholder_en, placeholder_cn, + point=point, reference=reference, unit=unit, + include_header=False + ) + print(f" ✓ 添加表格: {abb} ({name}) = {result}") + added_items.append((abb, name, result)) + added_count += 1 + insert_pos += 2 + except Exception as e: + print(f" ✗ 添加 {abb} 失败: {e}") + + print(f"\n ✓ 已添加 {added_count} 个表格") + + # ===== 第三步:调用DeepSeek生成Clinical Significance解释 ===== + if added_items and api_key: + print(f"\n 🤖 步骤3: 调用DeepSeek生成Clinical Significance解释...") + + # 遍历文档中的表格,找到占位符并替换为AI解释 + for abb, name, result in added_items: + print(f" 🤖 生成 {abb} 的临床意义解释...") + ai_explanation = get_ai_explanation(abb, name, result, api_key) + + # 在文档中找到该ABB的表格并更新解释 + for table in doc.tables: + for row in table.rows: + cells = row.cells + if len(cells) > 0: + first_cell_text = cells[0].text.strip().upper() + if first_cell_text == abb.upper(): + # 找到匹配的ABB,查找下一行的Clinical Significance + row_idx = list(table.rows).index(row) + if row_idx + 1 < len(table.rows): + sig_row = table.rows[row_idx + 1] + sig_cell = sig_row.cells[0] + if 'Generating' in sig_cell.text or '正在生成' in sig_cell.text: + # 替换占位符 + sig_cell.text = '' + p = sig_cell.paragraphs[0] + p.alignment = WD_ALIGN_PARAGRAPH.LEFT + + def set_font(run, bold=False, font_size=9): + run.bold = bold + run.font.name = 'Times New Roman' + run.font.size = Pt(font_size) + run._element.rPr.rFonts.set(qn('w:eastAsia'), '宋体') + + run = p.add_run('Clinical Significance: ') + set_font(run, bold=True) + run = p.add_run(ai_explanation['en']) + set_font(run) + run = p.add_run('\n') + run = p.add_run('临床意义:') + set_font(run, bold=True) + run = p.add_run(ai_explanation['cn']) + set_font(run) + print(f" ✓ 已更新 {abb} 的解释") + break + + print(f"\n ✅ 缺失项目处理完成,共添加 {added_count} 个项目") + + +def clean_empty_rows(doc_path: str, output_path: str, patient_info: dict = None): + """清理空白数据行,并将数据表格合并到表头下 + + 规则: + 1. 删除空数据行(ABB有内容但Result为空) + 2. 如果表头下只有描述没有数据,删除描述,将下方数据表格内容移上来 + + 重要:跳过保护区域(前四页)和"客户功能医学检测档案"区域的所有表格 + + Args: + doc_path: 文档路径 + output_path: 输出路径 + patient_info: 患者信息字典,包含gender字段(从OCR文本提取),用于模块清理 + """ + from docx import Document + from lxml import etree + import re + import copy + from xml_safe_save import safe_save + + template_path = Path(__file__).parent / "template_complete.docx" + + doc = Document(doc_path) + + # 获取保护边界位置 + protection_boundary = find_health_program_boundary(doc) + print(f" [保护] 清理空行时跳过前 {protection_boundary} 个元素") + + # 获取"客户功能医学检测档案"区域位置 + exam_file_start, exam_file_end = find_examination_file_region(doc) + if exam_file_start >= 0: + print(f" [保护] 清理空行时跳过'客户功能医学检测档案'区域: {exam_file_start}-{exam_file_end}") + + def is_in_protected_region(idx): + """检查索引是否在保护区域内""" + # 检查是否在前四页保护区域内 + if idx < protection_boundary: + return True + # 检查是否在"客户功能医学检测档案"区域内 + if exam_file_start >= 0 and exam_file_start <= idx < exam_file_end: + return True + return False + + # 构建保护区域内的表格集合(包括前四页和"客户功能医学检测档案"区域) + body = doc.element.body + body_children = list(body) + protected_tables = set() + for i, elem in enumerate(body_children): + if is_in_protected_region(i): + if elem.tag.endswith('}tbl'): + for t in doc.tables: + if t._tbl is elem: + protected_tables.add(id(t)) + break + print(f" [保护] 保护区域内有 {len(protected_tables)} 个表格将被跳过") + + removed_rows = 0 + merged_count = 0 + + def has_data_in_row(cells): + """检查行是否有有效数据(只以 Result 列判断,避免 Refer 范围数字误判)""" + valid_qualitative = [ + 'negative', 'positive', 'normal', 'reactive', 'non-reactive', + 'a', 'b', 'ab', 'o', # 血型 + 'yellow', 'amber', 'straw', 'colorless', 'red', 'brown', 'dark', 'clear' # 颜色 + ] + + # 模板结构通常为: + # - 11列:0 ABB, 1-2 Project, 3-4 Result, 5-6 Point, 7-8 Refer, 9-10 Unit + # - 6列:0 ABB, 1 Project, 2 Result, 3 Point, 4 Refer, 5 Unit + if len(cells) >= 11: + result_col_candidates = [3, 4] + elif len(cells) >= 6: + result_col_candidates = [2, 3] + else: + result_col_candidates = [2] + + result_candidates = [] + for col_idx in result_col_candidates: + if col_idx < len(cells): + txt = (cells[col_idx].text or '').strip() + if txt: + result_candidates.append(txt) + result_text = result_candidates[0] if result_candidates else '' + + if not result_text: + return False + if result_text in ['', '-', '/', ' ', '.', ':', '{{', '}}']: + return False + if result_text.startswith('{{'): + return False + + # 排除“范围值”形态(常出现在 Refer 列,但模板错位时也可能落到 Result/Point 列) + if re.match(r'^[\(\[]?\s*[-+]?\d+(?:\.\d+)?\s*[-–~]\s*[-+]?\d+(?:\.\d+)?\s*[\)\]]?$', result_text): + return False + + if re.search(r'\d', result_text): + return True + if result_text.lower() in valid_qualitative: + return True + return False + + def is_header_row(row_text, cells=None): + """精确识别表头行""" + # 先排除描述行,避免被误判为表头 + if 'clinical significance' in row_text or '临床意义' in row_text: + return False + + # 表头必须具备“Abb/简称 + Project/项目 + Result/结果”组合特征 + has_abb = ('abb' in row_text) or ('简称' in row_text) + has_project = ('project' in row_text) or ('项目' in row_text) + has_result = ('result' in row_text) or ('结果' in row_text) + if not (has_abb and has_project and has_result): + return False + + # 如果提供了cells,进行更严格的检查 + if cells: + # 表头行通常有多个列且每个单元格内容较短 + non_empty_cells = [c for c in cells if c.text.strip()] + if len(non_empty_cells) < 2: + return False + # 表头单元格内容通常较短(<30字符) + if any(len(c.text.strip()) > 30 for c in cells): + return False + + return True + + def is_title_row(row_text, cells=None): + """识别标题行(如 Blood Type 血型, Four Infectious Diseases 传染病四项)""" + # 先排除描述行,避免解释行误判为标题 + if 'clinical significance' in row_text or '临床意义' in row_text: + return False + + # 常见标题关键词 - 包含所有24个标准模块的关键词 + title_keywords = [ + # 英文关键词 + 'blood count', 'blood type', 'blood sugar', 'blood coagulation', + 'function', 'profile', 'panel', 'test', 'detection', + 'examination', 'analysis', 'screening', 'marker', 'hormone', + 'infectious', 'disease', 'immunoglobulin', 'complement', 'lipid', + 'electrolyte', 'coagulation', 'metabolism', 'microelement', 'trace element', + 'lymphocyte', 'humoral', 'immunity', 'inflammatory', 'autoantibody', + 'thromboembolism', 'imaging', 'gynecological', 'female-specific', + 'myocardial', 'enzyme', 'cardiac', # 心肌酶谱相关关键词 + # 中文关键词 + '血常规', '血型', '血糖', '凝血', '肝功能', '肾功能', '血脂', '甲状腺', + '检查', '检测', '传染病', '电解质', '骨代谢', '微量元素', '重金属', + '淋巴细胞', '体液免疫', '免疫功能', '炎症', '自身抗体', '心脑血管', + '影像', '妇科', '女性专项', '肿瘤标记物', '肿瘤标志物', '荷尔蒙', + '心肌酶', '心肌酶谱' # 心肌酶谱中文关键词 + ] + if any(kw in row_text for kw in title_keywords): + if cells: + # 获取所有非空单元格的内容 + non_empty_texts = [c.text.strip() for c in cells if c.text.strip()] + # 去重后的内容数量(合并单元格会有相同内容) + unique_texts = set(non_empty_texts) + # 标题行特征:去重后只有1-2种不同内容,或者只有少量非空单元格 + if len(unique_texts) <= 2 or len(non_empty_texts) <= 2: + return True + else: + return True + return False + + def is_description_row(row_text): + return 'clinical significance' in row_text or '临床意义' in row_text + + def is_data_row(first_cell): + if first_cell and 2 <= len(first_cell) <= 15: + clean = first_cell.replace('-', '').replace('/', '').replace('%', '').replace('(', '').replace(')', '').replace(' ', '') + return clean and clean.replace('.', '').isalnum() + return False + + def is_special_table(table): + """检查是否是自动生成的特殊格式表格(防止被合并) + + 特殊表格特征: + 1. 2-4行 + 2. 最后一行包含 "Clinical Significance" 或 "临床意义" + 3. 第一行不是模块标题(不包含重复的模块名称) + """ + rows = len(table.rows) + if rows < 2 or rows > 4: + return False + + try: + # 检查最后一行是否包含临床意义 + last_row_text = ' '.join([c.text for c in table.rows[-1].cells]).lower() + if 'clinical significance' not in last_row_text and '临床意义' not in last_row_text: + return False + + # 检查第一行是否是模块标题(模块标题表格不是特殊表格) + first_row_text = ' '.join([c.text for c in table.rows[0].cells]).lower() + # 模块标题特征:同一个文本重复多次 + first_cell = table.rows[0].cells[0].text.strip() + if first_cell and len(first_cell) > 3: + # 检查是否所有单元格都包含相同的文本 + all_same = all(first_cell in c.text for c in table.rows[0].cells) + if all_same: + return False # 这是模块标题表格,不是特殊表格 + + return True + except: + pass + return False + + def analyze_table(table): + """分析表格结构""" + info = {'header_idx': -1, 'title_idx': -1, 'desc_indices': [], + 'data_with_result': [], 'data_without_result': [], + 'is_special': is_special_table(table)} + + for row_idx, row in enumerate(table.rows): + cells = row.cells + if len(cells) < 2: + continue + row_text = ' '.join([c.text.strip().lower() for c in cells]) + first_cell = cells[0].text.strip() + + if is_header_row(row_text, cells): + info['header_idx'] = row_idx + elif is_title_row(row_text, cells): + info['title_idx'] = row_idx + elif is_description_row(row_text): + info['desc_indices'].append(row_idx) + elif is_data_row(first_cell): + if has_data_in_row(cells): + info['data_with_result'].append(row_idx) + else: + info['data_without_result'].append(row_idx) + return info + + def special_table_has_data(table): + """特殊表格是否有有效结果。 + + 支持多种结构: + 1. 普通项目表格:2-3行,cells[0]=ABB, cells[1]=项目名, cells[2]=Result + 2. 配对项目表格:3-4行,两个数据行(项目名 + Result),共享临床意义 + 注意:配对表格的ABB列(cells[0])可能为空,项目名在cells[1] + 3. 11列表格(模板):cells[0]=ABB, cells[1]=项目名, cells[2]可能是项目名重复 + + 若所有数据行都没有有效内容,则认为该表格应被删除。 + """ + try: + rows = len(table.rows) + if rows < 2: + return False + + # 检查是否有任何有效的数据行 + has_valid_data = False + for ri in range(rows): + cells = table.rows[ri].cells + if len(cells) < 2: + continue + first_cell = (cells[0].text or '').strip() + second_cell = (cells[1].text or '').strip() if len(cells) > 1 else '' + third_cell = (cells[2].text or '').strip() if len(cells) > 2 else '' + row_text = ' '.join([c.text for c in cells]).lower() + + # 跳过Clinical Significance行 + if 'clinical significance' in row_text or '临床意义' in row_text: + continue + # 跳过表头行 + if first_cell.lower().startswith('abb') or ('project' in row_text and '项目' in row_text): + continue + + # 检查是否有有效内容(ABB列、项目名列或Result列) + # 配对表格的ABB列可能为空,但项目名列和Result列有内容 + has_content = False + + # 检查ABB列(第一列) + if first_cell and first_cell not in [' ', '\n'] and not first_cell.startswith('{{'): + has_content = True + + # 检查项目名列(第二列)- 配对表格的中文项目名 + if not has_content and second_cell and second_cell not in [' ', '\n']: + # 排除占位符 + if not second_cell.startswith('{{'): + has_content = True + + # 检查Result列(第三列) + if not has_content and third_cell and third_cell not in [' ', '\n', '-', '/']: + if not third_cell.startswith('{{'): + has_content = True + + if has_content: + has_valid_data = True + break + + return has_valid_data + except: + return False + + def table_has_any_data(table): + """检查表格是否有任何有效数据(用于模块删除判断)""" + # 先检查特殊表格 + if is_special_table(table): + return special_table_has_data(table) + + # 普通表格检查 + info = analyze_table(table) + return len(info['data_with_result']) > 0 + + # 0. 先删除“特殊表格”中没有结果的整张表(否则后续逻辑会跳过它们) + removed_special_tables = 0 + for table in list(doc.tables): + # 跳过保护区域内的表格 + if id(table) in protected_tables: + continue + info = analyze_table(table) + if info['is_special'] and not special_table_has_data(table): + try: + table._tbl.getparent().remove(table._tbl) + removed_special_tables += 1 + except: + pass + + # 获取body中表格的顺序(只处理保护区域外的表格) + body = doc._body._body + table_order = [] + for elem in body: + if elem.tag.endswith('}tbl'): + for t in doc.tables: + if t._tbl is elem: + # 跳过保护区域内的表格 + if id(t) not in protected_tables: + table_order.append(t) + break + + # 第一遍:合并表格(表头下无数据,向后搜索找第一个有数据的表格) + tables_to_remove = set() + + for i in range(len(table_order)): + if table_order[i] in tables_to_remove: + continue + + t1 = table_order[i] + info1 = analyze_table(t1) + + # 如果t1本身就是特殊表格,不要往里合并东西 + if info1['is_special']: + continue + + # 条件:t1有表头但无数据 + if info1['header_idx'] >= 0 and len(info1['data_with_result']) == 0: + # 只在“下一个表头表格”之前搜索,避免跨模块吸走数据 + next_header_pos = None + for k in range(i + 1, len(table_order)): + if table_order[k] in tables_to_remove: + continue + k_info = analyze_table(table_order[k]) + + # 如果遇到特殊表格,视为边界,停止搜索 + if k_info['is_special']: + next_header_pos = k + break + + # 以“有表头但无数据”的表作为模块边界(数据表可能也带表头,不能当边界) + if k_info['header_idx'] >= 0 and len(k_info['data_with_result']) == 0: + next_header_pos = k + break + search_end = next_header_pos if next_header_pos is not None else len(table_order) + + # 在范围内收集所有“有数据且无表头”的表格 + candidates = [] + for j in range(i + 1, search_end): + if table_order[j] in tables_to_remove: + continue + candidate = table_order[j] + candidate_info = analyze_table(candidate) + + # 跳过特殊表格(不作为被合并对象) + if candidate_info['is_special']: + continue + + if len(candidate_info['data_with_result']) > 0: + candidates.append((candidate, candidate_info)) + + if not candidates: + continue + + # 用第一个候选数据表的“项目名”作为标题,覆盖t1标题(避免出现空标题) + title_text = '' + try: + first_candidate, first_candidate_info = candidates[0] + if first_candidate_info.get('data_with_result'): + data_row_idx = first_candidate_info['data_with_result'][0] + if len(first_candidate.rows[data_row_idx].cells) > 1: + title_text = first_candidate.rows[data_row_idx].cells[1].text.strip() + if not title_text: + title_text = first_candidate.rows[data_row_idx].cells[0].text.strip() + except: + title_text = '' + + # 清空t1(保留表头行) + header_idx = info1['header_idx'] + title_row_idx = header_idx + 1 + + # 清空:删除表头行之后所有旧行,但尽量保留表头下一行作为“标题行结构” + keep_title_row = title_row_idx < len(t1.rows) + delete_from = (title_row_idx + 1) if keep_title_row else (header_idx + 1) + for ridx in range(len(t1.rows) - 1, delete_from - 1, -1): + try: + t1._tbl.remove(t1.rows[ridx]._tr) + removed_rows += 1 + except: + pass + + # 确保存在标题行:没有则插入一行(插入后重新通过t1.rows获取) + if not keep_title_row: + try: + new_tr = copy.deepcopy(t1.rows[header_idx]._tr) + t1._tbl.insert(title_row_idx, new_tr) + except: + pass + + # 写入标题:只在第一列写入“第一条数据项目名”,其余列清空 + try: + if title_row_idx < len(t1.rows): + title_row = t1.rows[title_row_idx] + for c in title_row.cells: + c.text = '' + if title_text: + title_row.cells[0].text = title_text + except: + pass + + # 将候选表格的标题/数据/描述复制到t1,并删除候选表格 + for candidate, candidate_info in candidates: + rows_to_copy = [] + rows_to_copy.extend(candidate_info['data_with_result']) + rows_to_copy.extend(candidate_info['desc_indices']) + + for row_idx in sorted(rows_to_copy): + src_row = candidate.rows[row_idx] + new_tr = copy.deepcopy(src_row._tr) + t1._tbl.append(new_tr) + + tables_to_remove.add(candidate) + merged_count += 1 + + # 删除被合并的表格 + for t in tables_to_remove: + try: + t._tbl.getparent().remove(t._tbl) + except: + pass + + # 第二遍:删除剩余的空数据行(跳过特殊表格和保护区域) + # 同时删除紧随其后的"Clinical Significance/临床意义"描述行,避免留下孤儿解释块 + for table in doc.tables: + # 跳过保护区域内的表格 + if id(table) in protected_tables: + continue + info = analyze_table(table) + # 跳过特殊表格 + if info['is_special']: + continue + + rows_to_remove = set() + for row_idx in info['data_without_result']: + rows_to_remove.add(row_idx) + # 检查后续行是否是描述行(可能有多行描述) + next_idx = row_idx + 1 + while next_idx < len(table.rows): + try: + next_cells = table.rows[next_idx].cells + next_text = ' '.join([(c.text or '').strip().lower() for c in next_cells]) + # 检查是否是描述行 + if is_description_row(next_text): + rows_to_remove.add(next_idx) + next_idx += 1 + continue + # 也检查是否是空行或只有少量文字的行(可能是格式化问题) + if not next_text.strip() or len(next_text.strip()) < 5: + rows_to_remove.add(next_idx) + next_idx += 1 + continue + except: + pass + break + + # 额外检查:删除所有孤立的描述行(前面没有对应数据行的描述) + kept_data_rows = set(info['data_with_result']) - rows_to_remove + for desc_idx in info['desc_indices']: + # 检查这个描述行前面是否有保留的数据行 + has_data_before = False + for data_idx in kept_data_rows: + if data_idx < desc_idx: + # 检查data_idx和desc_idx之间是否没有其他数据行 + intervening_data = [d for d in kept_data_rows if data_idx < d < desc_idx] + if not intervening_data: + has_data_before = True + break + if not has_data_before: + rows_to_remove.add(desc_idx) + + for row_idx in sorted(rows_to_remove, reverse=True): + try: + table._tbl.remove(table.rows[row_idx]._tr) + removed_rows += 1 + except: + pass + + # 第二点五遍:补全合并后的标题行(表头下一行为空时,跳过特殊表格和保护区域) + for table in doc.tables: + # 跳过保护区域内的表格 + if id(table) in protected_tables: + continue + info = analyze_table(table) + # 跳过特殊表格 + if info['is_special']: + continue + if info['header_idx'] < 0: + continue + if len(info['data_with_result']) == 0: + continue + + title_row_idx = info['header_idx'] + 1 + if title_row_idx >= len(table.rows): + continue + + try: + title_row = table.rows[title_row_idx] + # 如果表头下一行本身就是数据行,则插入一个“空标题行”(复制表头行结构) + try: + first_cell = title_row.cells[0].text.strip() if title_row.cells else '' + if is_data_row(first_cell) and has_data_in_row(title_row.cells): + extracted_title = '' + try: + if len(title_row.cells) > 1: + extracted_title = title_row.cells[1].text.strip() + if not extracted_title: + extracted_title = title_row.cells[0].text.strip() + except: + extracted_title = '' + + header_tr = copy.deepcopy(table.rows[info['header_idx']]._tr) + table._tbl.insert(title_row_idx, header_tr) + title_row = table.rows[title_row_idx] + try: + for c in title_row.cells: + c.text = '' + if extracted_title: + title_row.cells[0].text = extracted_title + except: + pass + continue + except: + pass + + # 若标题行已有内容且不是空行,则不覆盖 + if any((c.text or '').strip() for c in title_row.cells): + continue + + first_data_idx = info['data_with_result'][0] + if first_data_idx >= len(table.rows): + continue + data_row = table.rows[first_data_idx] + + title_text = '' + if len(data_row.cells) > 1: + title_text = data_row.cells[1].text.strip() + if not title_text: + title_text = data_row.cells[0].text.strip() + if not title_text: + continue + + for c in title_row.cells: + c.text = '' + title_row.cells[0].text = title_text + except: + pass + + # 第三遍:删除所有没有数据的表格 + # 重要:跳过保护区域内的表格 + # 重要:保留模块标题表格(title_idx >= 0) + # 重要:保留表头表格(包含 Abb/Project/Result) + removed_tables = 0 + for table in list(doc.tables): + # 跳过保护区域内的表格 + if id(table) in protected_tables: + continue + info = analyze_table(table) + # 跳过特殊表格 - 这些是新生成的独立表格,必须保留 + if info['is_special']: + continue + # 跳过模块标题表格 - 这些是模块的标题行,必须保留 + if info['title_idx'] >= 0: + continue + # 跳过表头表格 - 这些是数据表格的表头,必须保留 + if info['header_idx'] >= 0: + continue + # 只要没有数据就删除整个表格 + if len(info['data_with_result']) == 0: + try: + table._tbl.getparent().remove(table._tbl) + removed_tables += 1 + except: + pass + + # 第3.5遍:删除重复的模块标题表格 + # 模块标题表格特征:只有1行,包含重复的模块名称 + seen_module_titles = set() + removed_duplicate_titles = 0 + for table in list(doc.tables): + if id(table) in protected_tables: + continue + # 检查是否是模块标题表格(只有1行,内容重复) + if len(table.rows) == 1: + row_text = ' '.join([c.text.strip() for c in table.rows[0].cells]).lower() + # 检查是否包含模块关键词且重复出现 + for kw in ['imaging', 'urine', 'blood count', 'blood type', 'coagulation', + 'infectious', 'electrolyte', 'liver', 'kidney', 'myocardial', + 'thyroid', 'lipid', 'blood sugar', 'thromboembolism', 'bone', + 'microelement', 'lymphocyte', 'humoral', 'inflammatory', + 'autoantibody', 'tumor', 'female hormone', 'male hormone', + 'female-specific', '影像', '尿液', '血常规', '血型', '凝血', + '传染病', '电解质', '肝功能', '肾功能', '心肌酶', '甲状腺', + '血脂', '血糖', '心脑血管', '骨代谢', '微量元素', '淋巴细胞', + '体液免疫', '炎症', '自身抗体', '肿瘤', '女性激素', '男性激素', '女性专项']: + if kw in row_text and row_text.count(kw) >= 2: + # 这是模块标题表格 + if kw in seen_module_titles: + # 重复的标题表格,删除 + try: + table._tbl.getparent().remove(table._tbl) + removed_duplicate_titles += 1 + except: + pass + else: + seen_module_titles.add(kw) + break + + if removed_duplicate_titles > 0: + print(f" [清理] 删除 {removed_duplicate_titles} 个重复的模块标题表格") + + # 重要:在模块清理之前,先保存并重新加载文档,确保索引正确 + safe_save(doc, output_path, template_path) + doc = Document(output_path) + + # 第四遍:删除无数据的模块(包括标题、文字、图片等) + from docx.text.paragraph import Paragraph + + module_keywords_cleanup = [ + 'urine detection', 'urine test', '尿液检测', 'complete blood count', '血常规', + 'blood sugar', 'glucose', '血糖', 'lipid panel', 'lipid profile', '血脂', + 'blood type', '血型', 'coagulation', 'blood coagulation', '凝血', + 'infectious disease', 'four infectious', '传染病', '传染病四项', + 'electrolyte', 'serum electrolyte', '电解质', '血清电解质', + 'liver function', '肝功能', 'kidney function', '肾功能', + 'cardiac enzyme', 'myocardial enzyme', 'enzyme spectrum', '心肌酶', '心肌酶谱', + 'thyroid', 'thyroid function', '甲状腺', '甲状腺功能', + 'cardiovascular', 'thromboembolism', '心血管', '心脑血管', + 'bone metabolism', '骨代谢', + 'trace element', 'heavy metal', 'microelement', '微量元素', '重金属', + 'lymphocyte', 'lymphocyte subpopulation', '淋巴细胞', '淋巴细胞亚群', + 'humoral immunity', '体液免疫', 'immune function', '免疫功能', + 'inflammation', 'inflammatory', '炎症', '炎症反应', + 'autoantibody', 'autoimmune', '自身抗体', '自身免疫', + 'female hormone', '女性激素', '女性荷尔蒙', 'male hormone', '男性激素', '男性荷尔蒙', + 'gynecological', 'female-specific', '妇科', '女性专项', + 'tumor marker', '肿瘤标记物', '肿瘤标志物', + 'imaging', '影像', + ] + exclude_keywords_cleanup = ['health program', 'health report', 'abnormal', '异常', 'overall', 'assessment', 'clinical significance', '临床意义', 'functional medical health advice', '功能医学健康建议', 'medical intervention', '医学干预', 'nutrition', '营养', 'exercise', '运动', 'sleep', '睡眠', 'lifestyle', '生活方式', 'follow-up', '随访', 'functional medical team', '功能医学团队', + '(一)', '(二)', '(三)', '(四)', '(五)', '(六)', + '复查', '监测', '标志物', '血液学', '状态', + 'bhrt', 'ivnt', 'msc', '干细胞', '静脉营养', '激素替代', + '建议', '方案', '治疗', '调理', '改善', '优化'] + + protected_section_keywords = ['functional medical health advice', '功能医学健康建议', + 'overall health assessment', '整体健康状况', + 'abnormal index', '异常指标', + 'health report analysis', '健康报告分析', + 'medical intervention', '医学干预', + 'nutrition intervention', '营养干预', + 'exercise intervention', '运动干预', + 'sleep', '睡眠', 'lifestyle', '生活方式', + 'follow-up', '随访', 'functional medical team', '功能医学团队'] + + def is_protected_section_cleanup(text): + if not text: + return False + text_lower = text.lower().strip() + return any(kw in text_lower for kw in protected_section_keywords) + + def is_module_title_para_cleanup(text): + if not text or len(text) > 100: + return False + text_lower = text.lower().strip() + if text_lower.startswith('(i)') or text_lower.startswith('(ii)') or text_lower.startswith('(iii)'): + return False + if text_lower.startswith('i.') or text_lower.startswith('ii.') or text_lower.startswith('iii.'): + return False + if any(ex in text_lower for ex in exclude_keywords_cleanup): + return False + return any(kw in text_lower for kw in module_keywords_cleanup) + + def is_module_title_table_cleanup(table): + if len(table.rows) < 1 or len(table.rows) > 2: + return False + try: + full_text = ' '.join([c.text.strip() for row in table.rows for c in row.cells]).lower() + if 'clinical significance' in full_text or '临床意义' in full_text: + return False + if 'abb' in full_text and 'project' in full_text and 'result' in full_text: + return False + + # 模块标题关键词(包含变体拼写) + module_title_names = [ + 'urine detection', 'urine test', '尿液检测', + 'complete blood count', '血常规', + 'blood sugar', '血糖', 'lipid profile', '血脂', 'blood type', '血型', + 'blood coagulation', '凝血功能', 'four infectious diseases', '传染病四项', + 'serum electrolytes', '血电解质', 'liver function', '肝功能', + 'kidney function', '肾功能', 'myocardial enzyme', '心肌酶', + 'thyroid function', '甲状腺功能', 'thromboembolism', '心脑血管', + 'bone metabolism', '骨代谢', 'microelement', '微量元素', + 'humoral immunity', '体液免疫', 'inflammatory reaction', '炎症反应', + 'autoantibody', '自身抗体', 'female hormone', '女性激素', + 'male hormone', '男性激素', 'tumor markers', '肿瘤标记物', + 'lymphocyte', 'lymphocyto', '淋巴细胞', '淋巴细胞亚群', + 'imaging', '影像学', 'female-specific', '女性专项' + ] + + row_text = ' '.join([c.text.strip() for c in table.rows[0].cells]).lower() + # 放宽条件:只要标题出现1次即可(之前要求2次太严格) + for title in module_title_names: + if title in row_text: + return True + return False + except: + return False + + body = doc._body._body + body_children = list(body) + + tbl_map = {} + for t in doc.tables: + tbl_map[id(t._tbl)] = t + + # 精确识别模块ID(按优先级排列,female hormone 必须在 male hormone 之前匹配,避免子串冲突) + _MODULE_IDENTIFY_RULES = [ + ('female hormone', 'female hormone'), ('女性荷尔蒙', 'female hormone'), ('女性激素', 'female hormone'), + ('male hormone', 'male hormone'), ('男性荷尔蒙', 'male hormone'), ('男性激素', 'male hormone'), + ('female-specific', 'female-specific'), ('女性专项', 'female-specific'), + ('urine detection', 'urine'), ('urine test', 'urine'), ('尿液检测', 'urine'), + ('complete blood count', 'blood count'), ('血常规', 'blood count'), + ('blood sugar', 'blood sugar'), ('血糖', 'blood sugar'), + ('lipid profile', 'lipid'), ('血脂', 'lipid'), + ('blood type', 'blood type'), ('血型', 'blood type'), + ('blood coagulation', 'coagulation'), ('凝血功能', 'coagulation'), ('凝血', 'coagulation'), + ('four infectious', 'infectious'), ('传染病', 'infectious'), + ('serum electrolyte', 'electrolyte'), ('血电解质', 'electrolyte'), ('电解质', 'electrolyte'), + ('liver function', 'liver'), ('肝功能', 'liver'), + ('kidney function', 'kidney'), ('肾功能', 'kidney'), + ('myocardial enzyme', 'myocardial'), ('心肌酶', 'myocardial'), + ('thyroid function', 'thyroid'), ('甲状腺功能', 'thyroid'), ('甲状腺', 'thyroid'), + ('thromboembolism', 'thrombo'), ('心脑血管', 'thrombo'), + ('bone metabolism', 'bone'), ('骨代谢', 'bone'), + ('microelement', 'microelement'), ('微量元素', 'microelement'), + ('humoral immunity', 'humoral'), ('体液免疫', 'humoral'), + ('inflammatory', 'inflammatory'), ('炎症', 'inflammatory'), + ('autoantibody', 'autoantibody'), ('自身抗体', 'autoantibody'), + ('tumor marker', 'tumor'), ('肿瘤标记', 'tumor'), + ('lymphocyte', 'lymphocyte'), ('lymphocyto', 'lymphocyte'), ('淋巴细胞', 'lymphocyte'), + ('imaging', 'imaging'), ('影像', 'imaging'), + ] + + def identify_module_id(title_text): + """从模块标题文本精确识别模块ID""" + text_lower = title_text.lower() + for pattern, mid in _MODULE_IDENTIFY_RULES: + if pattern in text_lower: + return mid + return None + + # 找出所有模块标题表格及其位置(统一使用 is_module_title_table_cleanup + identify_module_id) + module_title_positions = [] # [(position, table, module_id)] + for i, elem in enumerate(body_children): + if elem.tag.endswith('}tbl'): + for t in doc.tables: + if t._tbl is elem: + if is_module_title_table_cleanup(t): + try: + title_text = ' '.join([c.text.strip() for c in t.rows[0].cells]) + except: + title_text = '' + mid = identify_module_id(title_text) + if mid: + module_title_positions.append((i, t, mid)) + break + + # 检查每个模块是否有数据表格 + modules_with_data = set() + for idx, (pos, title_table, module_id) in enumerate(module_title_positions): + next_pos = module_title_positions[idx + 1][0] if idx + 1 < len(module_title_positions) else len(body_children) + + has_data = False + for j in range(pos + 1, next_pos): + elem = body_children[j] + if elem.tag.endswith('}tbl'): + for t in doc.tables: + if t._tbl is elem: + if not is_module_title_table_cleanup(t) and table_has_any_data(t): + has_data = True + break + if has_data: + break + + if has_data: + modules_with_data.add(module_id) + + print(f" [模块清理] 有数据的模块: {sorted(modules_with_data)}") + + # 根据性别判断结果,决定删除哪个荷尔蒙模块 + # 将中文"男性"/"女性"转换为英文"male"/"female" + gender_from_ocr = patient_info.get('gender', '') if patient_info else '' + if gender_from_ocr == '男性': + detected_gender = 'male' + elif gender_from_ocr == '女性': + detected_gender = 'female' + else: + # 如果没有从OCR提取到性别,使用默认值(女性) + detected_gender = 'female' + + # 模块ID到描述段落搜索关键词的映射(用于清理文档中残留的描述段落) + module_desc_mapping = { + 'urine': ('urine detection', '尿液检测'), + 'blood count': ('complete blood count', '血常规'), + 'blood sugar': ('blood sugar', '血糖'), + 'lipid': ('lipid profile', '血脂'), + 'blood type': ('blood type', '血型'), + 'coagulation': ('blood coagulation', '凝血'), + 'infectious': ('four infectious', '传染病'), + 'electrolyte': ('serum electrolyte', '电解质'), + 'liver': ('liver function', '肝功能'), + 'kidney': ('kidney function', '肾功能'), + 'myocardial': ('myocardial enzyme', '心肌酶'), + 'thyroid': ('thyroid function', '甲状腺'), + 'thrombo': ('thromboembolism', '心脑血管'), + 'bone': ('bone metabolism', '骨代谢'), + 'microelement': ('microelement', '微量元素'), + 'humoral': ('humoral immunity', '体液免疫'), + 'inflammatory': ('inflammatory', '炎症'), + 'autoantibody': ('autoantibody', '自身抗体'), + 'female hormone': ('female hormone', '女性荷尔蒙'), + 'male hormone': ('male hormone', '男性荷尔蒙'), + 'tumor': ('tumor marker', '肿瘤标记'), + 'lymphocyte': ('lymphocyto', '淋巴细胞'), + 'imaging': ('imaging', '影像'), + 'female-specific': ('female-specific', '女性专项'), + } + + # 荷尔蒙模块清理逻辑:根据性别判断结果,只保留一个荷尔蒙模块 + if detected_gender == 'male': + if 'female hormone' in modules_with_data: + print(f" [模块清理] 性别为男性,强制删除女性荷尔蒙模块") + modules_with_data.discard('female hormone') + else: # female + if 'male hormone' in modules_with_data: + print(f" [模块清理] 性别为女性,强制删除男性荷尔蒙模块") + modules_with_data.discard('male hormone') + + # 动态构建需要清理描述的空模块列表(所有没有数据的模块) + empty_modules_to_clean = [] + for module_id, (en_title, cn_title) in module_desc_mapping.items(): + if module_id not in modules_with_data: + empty_modules_to_clean.append((module_id, en_title, cn_title)) + + print(f" [模块清理] 需要删除描述的空模块: {[m[0] for m in empty_modules_to_clean]}") + + removed_modules = 0 + print(f" [模块清理] 找到 {len(module_title_positions)} 个模块起点") + for idx in range(len(module_title_positions) - 1, -1, -1): + start_i, _tbl, module_id = module_title_positions[idx] + end_i = module_title_positions[idx + 1][0] if idx + 1 < len(module_title_positions) else len(body_children) + try: + module_title = ' '.join([c.text.strip() for c in _tbl.rows[0].cells])[:40] + except: + module_title = 'Unknown' + + module_elements = body_children[start_i:end_i] + + if is_protected_section_cleanup(module_title): + continue + + # 根据性别判断是否强制删除荷尔蒙模块(精确匹配module_id) + should_force_remove = False + if module_id == 'female hormone' and detected_gender == 'male': + should_force_remove = True + print(f" [模块清理] 性别为男性,强制删除女性荷尔蒙模块: {module_title}") + elif module_id == 'male hormone' and detected_gender == 'female': + should_force_remove = True + print(f" [模块清理] 性别为女性,强制删除男性荷尔蒙模块: {module_title}") + + # 如果模块有数据且不需要强制删除,直接跳过 + if not should_force_remove and module_id and module_id in modules_with_data: + continue + + # 兆底检查:扫描模块内表格是否实际有数据 + module_has_data = False + for e in module_elements: + if e.tag.endswith('}tbl'): + for t in doc.tables: + if t._tbl is e: + if not is_module_title_table_cleanup(t) and table_has_any_data(t): + module_has_data = True + break + + if should_force_remove or not module_has_data: + # 安全边界(向后):从 start_i+1 往后扫描,找到下一个模块的标题段落,避免删除下一个模块的标题+描述 + safe_end = end_i + for ei in range(start_i + 1, end_i): + elem = body_children[ei] + if elem.tag.endswith('}p'): + p_text = ''.join(elem.itertext()).strip() + if is_module_title_para_cleanup(p_text): + # 确认这个标题段落属于另一个模块(不是当前模块) + p_mid = identify_module_id(p_text) + if p_mid and p_mid != module_id: + safe_end = ei + break + + # 安全边界(向前):从 start_i-1 往前扫描,找到当前模块的标题段落和描述段落 + # 这些段落在标题表格之前,需要一起删除 + safe_start = start_i + for ei in range(start_i - 1, -1, -1): + elem = body_children[ei] + if elem.tag.endswith('}tbl'): + # 遇到表格(上一个模块的数据表格),停止 + break + if elem.tag.endswith('}p'): + p_text = ''.join(elem.itertext()).strip() + if is_module_title_para_cleanup(p_text): + p_mid = identify_module_id(p_text) + if p_mid and p_mid != module_id: + # 属于其他模块的标题段落,停止 + break + safe_start = ei + + removed_in_module = 0 + for ei in range(safe_end - 1, safe_start - 1, -1): + try: + body_children[ei].getparent().remove(body_children[ei]) + removed_in_module += 1 + except: + pass + removed_modules += 1 + if should_force_remove: + print(f" [模块清理] 删除荷尔蒙模块(根据性别): {module_title} ({removed_in_module} 个元素)") + else: + print(f" [模块清理] 删除空模块: {module_title} ({removed_in_module} 个元素)") + + # 删除空模块的描述段落 + if empty_modules_to_clean: + # 重新获取body_children(因为上面可能删除了一些元素) + body_children = list(body) + + from docx.oxml.ns import qn + + # 构建数据模块关键词集合(用于安全检查,防止误删有数据模块的内容) + data_module_keywords = set() + for mid in modules_with_data: + if mid in module_desc_mapping: + en, cn = module_desc_mapping[mid] + data_module_keywords.add(en.lower()) + data_module_keywords.add(cn) + + # 找到所有描述段落标题的位置 + desc_title_positions = [] # [(position, module_id, title_text)] + for i, elem in enumerate(body_children): + if elem.tag.endswith('}p'): + text_parts = [] + for t in elem.iter(qn('w:t')): + if t.text: + text_parts.append(t.text) + text = ''.join(text_parts).strip() + text_lower = text.lower() + + # 检查是否是描述段落标题(包含模块名称) + # 注意:描述标题可能较长(如 "Thyroid Function Test Result Analysis 甲状腺功能检测结果分析"),放宽到200字符 + if len(text) < 200: + for module_id, en_title, cn_title in empty_modules_to_clean: + if en_title in text_lower and cn_title in text: + desc_title_positions.append((i, module_id, text[:40])) + break + + # 找到所有可能的描述段落标题(用于确定边界) + # 关键:必须检测所有模块的描述标题(包括有数据的模块),作为删除边界 + all_desc_titles = [ + 'urine detection', 'complete blood count', 'blood sugar', 'lipid profile', + 'blood type', 'blood coagulation', 'four infectious', 'serum electrolyte', + 'liver function', 'kidney function', 'myocardial enzyme', 'thyroid function', + 'thromboembolism', 'bone metabolism', 'microelement', 'humoral immunity', + 'inflammatory', 'autoantibody', 'female hormone', 'male hormone', + 'tumor marker', 'lymphocyte', 'lymphocyto', 'imaging', 'female-specific' + ] + + all_title_positions = [] + for i, elem in enumerate(body_children): + if elem.tag.endswith('}p'): + text_parts = [] + for t in elem.iter(qn('w:t')): + if t.text: + text_parts.append(t.text) + text = ''.join(text_parts).strip() + text_lower = text.lower() + + # 放宽长度限制到200字符,避免遗漏长标题导致边界检测失败 + if len(text) < 200: + for title in all_desc_titles: + if title in text_lower: + all_title_positions.append(i) + break + + all_title_positions.sort() + print(f" [描述清理] 检测到 {len(desc_title_positions)} 个空模块描述标题, {len(all_title_positions)} 个边界标题") + + # 删除空模块的描述段落 + removed_desc = 0 + for pos, module_id, title_text in sorted(desc_title_positions, reverse=True): + # 找到下一个描述标题的位置 + next_pos = len(body_children) + for p in all_title_positions: + if p > pos: + next_pos = p + break + + # 安全检查:扫描待删除范围,如果包含有数据模块的关键词则截断 + safe_end = next_pos + for i in range(pos + 1, next_pos): + if i < len(body_children): + elem_text = ''.join(body_children[i].itertext()).strip().lower() + for dkw in data_module_keywords: + if dkw.lower() in elem_text: + # 发现有数据模块的内容,截断删除范围 + safe_end = i + print(f" [描述清理] 安全截断: {title_text} 在位置 {i} 发现数据模块关键词 '{dkw}',从 {next_pos} 截断到 {safe_end}") + break + if safe_end != next_pos: + break + + # 删除从当前标题到安全边界之间的所有元素 + elements_to_remove = [] + for i in range(pos, safe_end): + if i < len(body_children): + elements_to_remove.append(body_children[i]) + + for elem in reversed(elements_to_remove): + try: + elem.getparent().remove(elem) + removed_desc += 1 + except: + pass + + print(f" [描述清理] 删除空模块描述: {title_text} ({len(elements_to_remove)} 个元素, 范围 {pos}-{safe_end})") + + # 使用安全保存 + safe_save(doc, output_path, template_path) + print(f"\n✓ 清理完成: 删除 {removed_rows} 行, 合并 {merged_count} 对表格, 删除 {removed_tables} 个空表格, 删除 {removed_special_tables} 个空特殊表格") + print(f"✓ 模块清理: 删除 {removed_modules} 个无数据模块") + + return doc + + +def format_document_structure(doc_path: str, output_path: str): + """ + 整理Word文档结构: + 1. 清理多余的空白段落(连续空段落只保留一个) + 2. 在模块标题前插入分页符(确保每个模块从新页开始) + + 重要:跳过保护区域(前四页)和"客户功能医学检测档案"区域的所有元素 + """ + from docx import Document + from docx.oxml.ns import qn + from docx.oxml import OxmlElement + from xml_safe_save import safe_save + + template_path_local = Path(__file__).parent / "template_complete.docx" + + doc = Document(doc_path) + body = doc.element.body + + # 获取保护边界位置 + protection_boundary = find_health_program_boundary(doc) + print(f" [保护] 格式整理时跳过前 {protection_boundary} 个元素") + + # 获取"客户功能医学检测档案"区域位置 + exam_file_start, exam_file_end = find_examination_file_region(doc) + if exam_file_start >= 0: + print(f" [保护] 格式整理时跳过'客户功能医学检测档案'区域: {exam_file_start}-{exam_file_end}") + + # 模块标题关键词(与清理函数保持一致) + module_keywords = [ + 'urine detection', 'urine test', '尿液检测', 'complete blood count', '血常规', + 'blood sugar', 'glucose', '血糖', 'lipid panel', 'lipid profile', '血脂', + 'blood type', '血型', 'coagulation', 'blood coagulation', '凝血', + 'infectious disease', 'four infectious', '传染病', '传染病四项', + 'electrolyte', 'serum electrolyte', '电解质', '血清电解质', + 'liver function', '肝功能', 'kidney function', '肾功能', + 'cardiac enzyme', 'myocardial enzyme', 'enzyme spectrum', '心肌酶', '心肌酶谱', + 'thyroid', 'thyroid function', '甲状腺', '甲状腺功能', + 'cardiovascular', 'thromboembolism', '心血管', '心脑血管', + 'bone metabolism', '骨代谢', + 'trace element', 'heavy metal', 'microelement', '微量元素', '重金属', + 'lymphocyte', 'lymphocyte subpopulation', '淋巴细胞', '淋巴细胞亚群', + 'humoral immunity', '体液免疫', 'immune function', '免疫功能', + 'inflammation', 'inflammatory', '炎症', '炎症反应', + 'autoantibody', 'autoimmune', '自身抗体', '自身免疫', + 'female hormone', '女性激素', '女性荷尔蒙', 'male hormone', '男性激素', '男性荷尔蒙', + 'gynecological', 'female-specific', '妇科', '女性专项', + 'tumor marker', '肿瘤标记物', '肿瘤标志物', + 'imaging', '影像', + ] + + exclude_keywords = ['health program', 'health report', 'abnormal', '异常', 'overall', 'assessment', + 'medical intervention', '医学干预', 'functional medical health advice', '功能医学健康建议'] + + def is_module_title_paragraph(text): + """检查段落是否是模块标题""" + if not text or len(text) > 100: + return False + text_lower = text.lower().strip() + + # 排除章节大标题(以罗马数字或括号数字开头) + if text_lower.startswith('(i)') or text_lower.startswith('(ii)') or text_lower.startswith('(iii)'): + return False + if text_lower.startswith('i.') or text_lower.startswith('ii.') or text_lower.startswith('iii.'): + return False + + if any(ex in text_lower for ex in exclude_keywords): + return False + return any(kw in text_lower for kw in module_keywords) + + def is_module_title_table(elem): + """检查表格元素是否是模块标题表格""" + text = ''.join(elem.itertext()).strip() + if not text or len(text) > 200: + return False + text_lower = text.lower() + + # 排除章节大标题 + if any(ex in text_lower for ex in exclude_keywords): + return False + + # 检查是否包含模块关键词 + for kw in module_keywords: + if kw in text_lower: + # 模块标题表格通常会重复模块名称多次 + if text_lower.count(kw) >= 2: + return True + return False + + def is_in_protected_region(idx): + """检查索引是否在保护区域内""" + # 检查是否在前四页保护区域内 + if idx < protection_boundary: + return True + # 检查是否在"客户功能医学检测档案"区域内 + if exam_file_start >= 0 and exam_file_start <= idx < exam_file_end: + return True + return False + + def create_page_break_paragraph(): + """创建包含分页符的段落""" + p = OxmlElement('w:p') + r = OxmlElement('w:r') + br = OxmlElement('w:br') + br.set(qn('w:type'), 'page') + r.append(br) + p.append(r) + return p + + # 第一步:清理多余的空白段落和占位符段落(跳过保护区域) + removed_count = 0 + children = list(body) + prev_was_empty_p = False + + # 需要删除的占位符文本 + placeholder_texts = ['testing result检测结果', 'testing result 检测结果'] + + for i, elem in enumerate(children): + # 跳过保护区域(包括前四页和"客户功能医学检测档案"区域) + if is_in_protected_region(i): + prev_was_empty_p = False # 重置状态,避免跨区域删除 + continue + if elem.tag.endswith('}p'): + text = ''.join(elem.itertext()).strip() + text_lower = text.lower().replace(' ', '') + has_break = elem.find('.//{http://schemas.openxmlformats.org/wordprocessingml/2006/main}br') is not None + + # 删除 "Testing Result检测结果" 占位符段落 + if any(ph.replace(' ', '') in text_lower for ph in placeholder_texts): + try: + body.remove(elem) + removed_count += 1 + continue + except: + pass + + if not text and not has_break: + if prev_was_empty_p: + try: + body.remove(elem) + removed_count += 1 + except: + pass + else: + prev_was_empty_p = True + else: + prev_was_empty_p = False + else: + prev_was_empty_p = False + + # 第二步:在模块标题前插入分页符(每个模块都需要,跳过保护区域) + # 注意:模块标题可能是段落(

)或表格() + # 重新计算保护区域边界(因为第一步删除元素后位置偏移) + protection_boundary = find_health_program_boundary(doc) + exam_file_start, exam_file_end = find_examination_file_region(doc) + pagebreak_count = 0 + children = list(body) # 重新获取 + + for i, elem in enumerate(children): + # 跳过保护区域 + if is_in_protected_region(i): + continue + + is_title = False + + # 检查段落类型的模块标题 + if elem.tag.endswith('}p'): + text = ''.join(elem.itertext()).strip() + if is_module_title_paragraph(text): + is_title = True + + # 检查表格类型的模块标题 + elif elem.tag.endswith('}tbl'): + if is_module_title_table(elem): + is_title = True + + if is_title: + # 检查前面是否已经有分页符 + has_pagebreak_before = False + if i > 0: + prev_elem = children[i-1] + if prev_elem.tag.endswith('}p'): + prev_break = prev_elem.find('.//{http://schemas.openxmlformats.org/wordprocessingml/2006/main}br') + if prev_break is not None and prev_break.get(qn('w:type')) == 'page': + has_pagebreak_before = True + + if not has_pagebreak_before: + # 在模块标题前插入分页符 + pb = create_page_break_paragraph() + elem.addprevious(pb) + pagebreak_count += 1 + + # 第2.3步:清理特定模块后的空白页 + # 特殊处理:某些模块后面容易产生空白页(凝血功能、骨代谢等) + def clean_module_trailing_blanks(body, module_keywords, next_module_keywords): + """清理指定模块数据表格前的多余空白段落""" + children = list(body) + removed_count = 0 + + # 找到模块标题表格的位置(数据区域开始) + for i, elem in enumerate(children): + if elem.tag.endswith('}tbl'): + text = ''.join(elem.itertext()).strip().lower() + if any(kw in text for kw in module_keywords): + # 找到了模块标题表格,检查前面是否有多余的空段落 + # 往前查找,删除分页符前的空段落(保留一个分页符) + j = i - 1 + page_break_found = False + while j >= 0: + prev_elem = children[j] + if prev_elem.tag.endswith('}p'): + prev_text = ''.join(prev_elem.itertext()).strip() + has_break = prev_elem.find('.//{http://schemas.openxmlformats.org/wordprocessingml/2006/main}br') is not None + + if not prev_text and not has_break: + # 空段落,删除 + try: + body.remove(prev_elem) + removed_count += 1 + except: + pass + elif has_break and not prev_text: + # 分页符段落 + if page_break_found: + # 已经有一个分页符了,删除多余的 + try: + body.remove(prev_elem) + removed_count += 1 + except: + pass + else: + page_break_found = True + else: + # 有内容的段落,停止 + break + else: + # 不是段落,停止 + break + j -= 1 + # 重新获取children + children = list(body) + + return removed_count + + # 清理凝血功能模块数据表格前的空白 + removed = clean_module_trailing_blanks(body, ['coagulation', '凝血'], ['infectious', '传染病']) + if removed > 0: + print(f" 🧹 清理凝血功能模块前 {removed} 个空白元素") + + # 清理骨代谢模块数据表格前的空白 + removed = clean_module_trailing_blanks(body, ['bone metabolism', '骨代谢'], ['microelement', '微量元素']) + if removed > 0: + print(f" 🧹 清理骨代谢模块前 {removed} 个空白元素") + + # 清理骨代谢模块数据表格后、微量元素分页符前的空段落 + def clean_between_modules(body, current_module_keywords, next_module_keywords): + """清理当前模块最后一个数据表格后、下一个模块分页符前的空段落""" + children = list(body) + removed_count = 0 + w_ns = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}' + + # 找到下一个模块标题的位置 + next_module_pos = -1 + for i, elem in enumerate(children): + text = ''.join(elem.itertext()).strip().lower() + if any(kw in text for kw in next_module_keywords): + next_module_pos = i + break + + if next_module_pos < 0: + return 0 + + # 从下一个模块标题往前找,删除空段落(保留一个分页符) + j = next_module_pos - 1 + page_break_found = False + while j >= 0: + elem = children[j] + if elem.tag.endswith('}p'): + text = ''.join(elem.itertext()).strip() + br_elem = elem.find(f'.//{w_ns}br') + has_break = br_elem is not None + break_type = br_elem.get(f'{w_ns}type', '') if br_elem is not None else '' + + if not text and not has_break: + # 空段落,删除 + try: + body.remove(elem) + removed_count += 1 + except: + pass + elif has_break and break_type == 'page' and not text: + # 分页符段落 + if page_break_found: + # 已经有一个分页符了,删除多余的 + try: + body.remove(elem) + removed_count += 1 + except: + pass + else: + page_break_found = True + # 找到分页符,停止(保留这个分页符) + break + else: + # 有内容的段落或其他类型的换行,停止 + break + elif elem.tag.endswith('}tbl'): + # 遇到表格,停止 + break + j -= 1 + + return removed_count + + removed = clean_between_modules(body, ['bone metabolism', '骨代谢'], ['microelement', '微量元素']) + if removed > 0: + print(f" 🧹 清理骨代谢模块后 {removed} 个空白元素") + + # 第2.5步:在保护区域之后的所有图片前添加分页符 + # 重要:只处理保护区域之后的图片,前四页的图片不能添加分页符 + safe_save(doc, output_path, template_path_local) + doc = Document(output_path) + body = doc.element.body + children = list(body) + health_program_pos = find_health_program_boundary(doc) + + print(f" [图片分页] 保护边界位置: {health_program_pos}") + + # 模块标题关键词(用于判断图片是否是页面底部的logo图片) + module_keywords = [ + 'urine', 'blood', 'sugar', 'lipid', 'coagulation', 'infectious', 'electrolyte', + 'liver', 'kidney', 'myocardial', 'thyroid', 'thromboembolism', 'bone', 'microelement', + 'immunity', 'inflammatory', 'autoantibody', 'hormone', 'tumor', 'lymphocyte', 'imaging', + '尿液', '血常规', '血糖', '血脂', '凝血', '传染病', '电解质', '肝功能', '肾功能', + '心肌酶', '甲状腺', '血栓', '骨代谢', '微量元素', '免疫', '炎症', '自身抗体', + '激素', '肿瘤', '淋巴', '影像' + ] + + def is_logo_image(children, img_idx): + """检查图片是否是页面底部的logo图片(logo后面通常紧跟着下一个模块标题)""" + # 检查图片后面的几个元素 + for j in range(img_idx + 1, min(img_idx + 5, len(children))): + next_elem = children[j] + next_text = ''.join(next_elem.itertext()).strip().lower() + # 如果后面紧跟着模块标题,说明这是logo图片 + if any(kw in next_text for kw in module_keywords): + return True + return False + + # 先收集所有需要添加分页符的图片元素 + # 注意:不再在图片前添加分页符,因为这会导致空白页 + # 分页符应该在模块标题前添加,而不是在logo图片前 + images_need_pagebreak = [] + # 暂时禁用图片分页符功能,因为它会导致空白页 + # for i, elem in enumerate(children): + # ... + + # 然后统一添加分页符(避免循环中修改列表导致的问题) + image_pagebreak_count = 0 + for elem in images_need_pagebreak: + pb = create_page_break_paragraph() + elem.addprevious(pb) + image_pagebreak_count += 1 + + if image_pagebreak_count > 0: + print(f" 📷 在 {image_pagebreak_count} 个图片前插入分页符") + + # 第三步:清理文档末尾的空白内容(空段落、分页符、空表格) + # 从后往前删除,直到遇到有内容的元素 + children = list(body) + removed_tail = 0 + for i in range(len(children) - 1, -1, -1): + elem = children[i] + tag = elem.tag.split('}')[-1] + + # 跳过sectPr(文档设置) + if tag == 'sectPr': + continue + + # 检查是否是空段落或只有分页符的段落 + if tag == 'p': + text = ''.join(elem.itertext()).strip() + if not text: + try: + body.remove(elem) + removed_tail += 1 + continue + except: + pass + else: + break # 遇到有内容的段落,停止 + + # 检查是否是空表格(只有标题行没有数据) + elif tag == 'tbl': + # 找到对应的Table对象 + is_empty_table = True + for t in doc.tables: + if t._tbl is elem: + # 检查表格是否有实际数据 + for row in t.rows: + row_text = ' '.join([c.text.strip() for c in row.cells]).lower() + if row_text and 'clinical significance' not in row_text: + # 检查是否是数据行(包含数字或结果) + import re + if re.search(r'\d', row_text) or any(kw in row_text for kw in ['positive', 'negative', 'normal']): + is_empty_table = False + break + break + + if is_empty_table: + try: + body.remove(elem) + removed_tail += 1 + continue + except: + pass + else: + break # 遇到有数据的表格,停止 + else: + break # 遇到其他类型元素,停止 + + if removed_tail > 0: + print(f" 🧹 清理文档末尾 {removed_tail} 个空白元素") + + # 第三步:清理连续的分页符(避免空白页) + # 重新加载文档 + safe_save(doc, output_path, template_path_local) + doc = Document(output_path) + body = doc.element.body + children = list(body) + w_ns = '{http://schemas.openxmlformats.org/wordprocessingml/2006/main}' + + removed_pagebreaks = 0 + + # 清理分页符前面的空段落(这会导致空白页) + i = 0 + while i < len(children): + elem = children[i] + if elem.tag.endswith('}p'): + br = elem.find(f'.//{w_ns}br') + if br is not None and br.get(f'{w_ns}type') == 'page': + text = ''.join(elem.itertext()).strip() + if not text: # 这是一个分页符段落 + # 检查前面是否有空段落,如果有就删除 + if i > 0: + prev_elem = children[i - 1] + if prev_elem.tag.endswith('}p'): + prev_text = ''.join(prev_elem.itertext()).strip() + prev_br = prev_elem.find(f'.//{w_ns}br') + if not prev_text and prev_br is None: + # 前面是空段落,删除它 + try: + body.remove(prev_elem) + children = list(body) + removed_pagebreaks += 1 + continue # 不增加i,继续检查 + except: + pass + i += 1 + + # 清理连续的分页符 + children = list(body) + i = 0 + while i < len(children) - 1: + elem = children[i] + next_elem = children[i + 1] + + if elem.tag.endswith('}p'): + br = elem.find(f'.//{w_ns}br') + if br is not None and br.get(f'{w_ns}type') == 'page': + text = ''.join(elem.itertext()).strip() + if not text: + if next_elem.tag.endswith('}p'): + next_br = next_elem.find(f'.//{w_ns}br') + next_text = ''.join(next_elem.itertext()).strip() + + if next_br is not None and next_br.get(f'{w_ns}type') == 'page' and not next_text: + try: + body.remove(elem) + children = list(body) + removed_pagebreaks += 1 + continue + except: + pass + + elif not next_text and next_br is None: + try: + body.remove(next_elem) + children = list(body) + removed_pagebreaks += 1 + continue + except: + pass + i += 1 + + # 第四步:删除表头前面的多余分页符 + # 表头前面不应该有分页符(分页符应该在模块标题前面) + children = list(body) + removed_header_pagebreaks = 0 + i = 1 + while i < len(children): + elem = children[i] + if elem.tag.endswith('}tbl'): + # 检查是否是表头表格 + text = ''.join(elem.itertext()).strip().lower() + if 'abb' in text and 'project' in text and 'result' in text: + # 这是表头表格,检查前面是否有分页符 + if i > 0: + prev_elem = children[i - 1] + if prev_elem.tag.endswith('}p'): + br = prev_elem.find(f'.//{w_ns}br') + if br is not None and br.get(f'{w_ns}type') == 'page': + prev_text = ''.join(prev_elem.itertext()).strip() + if not prev_text: + try: + body.remove(prev_elem) + children = list(body) + removed_header_pagebreaks += 1 + continue # 不增加i + except: + pass + i += 1 + + if removed_pagebreaks > 0: + print(f" 🧹 清理 {removed_pagebreaks} 个连续分页符") + if removed_header_pagebreaks > 0: + print(f" 🧹 清理表头前 {removed_header_pagebreaks} 个多余分页符") + + # 使用安全保存 + safe_save(doc, output_path, template_path_local) + print(f"\n✓ 格式整理完成: 清理了 {removed_count} 个多余空白段落, 插入 {pagebreak_count} 个模块间分页符") + + return doc + + +def main(force_extract=False, use_deepseek=False, deepseek_api_key=None): + """ + 主函数 + Args: + force_extract: 是否强制重新提取数据(忽略缓存) + use_deepseek: 是否使用DeepSeek分析补充数据 + deepseek_api_key: DeepSeek API密钥 + """ + # 路径配置 + pdf_dir = r"c:\Users\UI\Desktop\医疗报告\医疗报告智能体" + template_config_path = Path(__file__).parent / "abb_mapping_config.json" + word_template_path = Path(__file__).parent / "template_complete.docx" + reports_dir = Path(__file__).parent / "reports" + reports_dir.mkdir(exist_ok=True) + from datetime import datetime + timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") + output_path = reports_dir / f"filled_report_{timestamp}.docx" + extracted_file = Path(__file__).parent / "extracted_medical_data.json" + + # ========== 获取保护边界位置(不备份,改为在各步骤中跳过保护区域)========== + print('\n' + '=' * 60) + print('[PROTECT] 检测保护区域边界(前四页)') + print('=' * 60) + template_doc = Document(word_template_path) + protection_boundary = find_health_program_boundary(template_doc) + print(f' 保护边界位置: {protection_boundary}') + print(f' 说明: 保护区域内的元素将在各处理步骤中被跳过') + del template_doc # 释放模板文档 + + + print("=" * 60) + print("步骤1: 获取检测数据 (百度OCR)") + print("=" * 60) + + # 检查PDF目录中的文件 + pdf_files = list(Path(pdf_dir).glob("*.pdf")) + pdf_files_info = {str(f.name): f.stat().st_mtime for f in pdf_files} + + # 检查是否需要重新提取 + need_extract = force_extract + + if not need_extract and extracted_file.exists(): + with open(extracted_file, 'r', encoding='utf-8') as f: + cached_data = json.load(f) + + # 检查缓存中记录的PDF文件信息 + cached_pdf_info = cached_data.get('pdf_files', {}) + + # 比较当前PDF文件和缓存中的文件 + if set(pdf_files_info.keys()) != set(cached_pdf_info.keys()): + # 文件列表不同(有新增或删除) + new_files = set(pdf_files_info.keys()) - set(cached_pdf_info.keys()) + removed_files = set(cached_pdf_info.keys()) - set(pdf_files_info.keys()) + if new_files: + print(f" 📄 检测到新增PDF文件: {', '.join(new_files)}") + if removed_files: + print(f" 📄 检测到删除PDF文件: {', '.join(removed_files)}") + need_extract = True + else: + # 检查文件修改时间 + for fname, mtime in pdf_files_info.items(): + if fname in cached_pdf_info and mtime > cached_pdf_info[fname]: + print(f" 📄 检测到PDF文件已更新: {fname}") + need_extract = True + break + else: + need_extract = True + + if not need_extract: + print(f" ✓ 发现缓存数据: {extracted_file}") + extracted_items = cached_data.get('items', []) + patient_info = cached_data.get('patient_info', {}) + print(f" ✓ 从缓存读取 {len(extracted_items)} 个检测项") + if patient_info: + print(f" ✓ 从缓存读取患者信息: {patient_info.get('name', '未知')}") + print(f" 💡 如需重新提取,请删除缓存文件或使用 --force 参数") + else: + # 重新提取 + if force_extract: + print(" 📄 强制重新提取...") + else: + print(" 📄 检测到文件变化,开始OCR提取...") + + # 提取检测数据(同时返回OCR原文,避免重复OCR) + extracted_items, ocr_texts = extract_all_pdfs(pdf_dir) + print(f"\n共提取 {len(extracted_items)} 个检测项") + + # 提取患者基本信息(复用已有的OCR文本,不再重复调用OCR) + patient_info = {} + if ocr_texts: + print("\n 📋 提取患者基本信息...") + first_ocr_text = next(iter(ocr_texts.values())) + patient_info = extract_patient_info(first_ocr_text) + print(f" 姓名: {patient_info.get('name', '未提取')}") + print(f" 性别: {patient_info.get('gender', '未提取')}") + print(f" 年龄: {patient_info.get('age', '未提取')}") + print(f" 体检时间: {patient_info.get('exam_time', '未提取')}") + print(f" 报告时间: {patient_info.get('report_time', '未提取')}") + + # 保存提取的数据(包含PDF文件信息和患者信息用于后续比较) + with open(extracted_file, 'w', encoding='utf-8') as f: + json.dump({ + 'total_items': len(extracted_items), + 'items': extracted_items, + 'pdf_files': pdf_files_info, # 记录PDF文件信息 + 'patient_info': patient_info # 记录患者信息 + }, f, ensure_ascii=False, indent=2) + print(f"✓ 数据已保存到: {extracted_file}") + + # 设置全局DeepSeek API Key + global DEEPSEEK_API_KEY + if deepseek_api_key: + DEEPSEEK_API_KEY = deepseek_api_key + + print("\n" + "=" * 60) + print("步骤2: 与模板结构匹配") + print("=" * 60) + with open(template_config_path, 'r', encoding='utf-8') as f: + template_config = json.load(f) + matched_data = match_with_template(extracted_items, template_config) + + # 步骤2.5: 使用DeepSeek补充参考范围和判断异常 + if use_deepseek and deepseek_api_key: + print("\n" + "=" * 60) + print("步骤2.5: 智能补充参考范围和异常判断") + print("=" * 60) + matched_data = enhance_data_with_deepseek(matched_data, deepseek_api_key) + + print("\n" + "=" * 60) + print("步骤3: 填入Word模板") + print("=" * 60) + fill_word_template(word_template_path, matched_data, output_path, deepseek_api_key, patient_info) + + # 步骤4: 处理额外检测项目 + # 注意:步骤3已经通过DeepSeek分类处理了大部分项目,这里只处理真正未被处理的项目 + print("\n" + "=" * 60) + print("步骤4: 处理额外检测项目") + print("=" * 60) + # 暂时禁用额外项目处理,因为步骤3已经通过DeepSeek分类处理了所有项目 + # 如果需要启用,需要修改extra_items_handler.py排除已在步骤3中处理的项目 + print(" ℹ️ 额外项目已在步骤3中通过DeepSeek分类处理") + # try: + # from extra_items_handler import process_extra_items + # process_extra_items(extracted_items, str(output_path), deepseek_api_key) + # except Exception as e: + # print(f" ⚠️ 额外项目处理失败: {e}") + # import traceback + # traceback.print_exc() + + # 步骤5: 填充异常指标汇总 + print("\n" + "=" * 60) + print("步骤5: 填充异常指标汇总") + print("=" * 60) + # 收集异常项目 + abnormal_items = [] + for abb, data in matched_data.items(): + point = data.get('point', '') + if point in ['↑', '↓', 'H', 'L', '高', '低']: + abnormal_items.append({ + 'abb': abb, + 'name': data.get('project', abb), + 'result': data.get('result', ''), + 'point': point, + 'reference': data.get('reference', ''), + 'unit': data.get('unit', '') + }) + + if abnormal_items: + print(f" 发现 {len(abnormal_items)} 个异常项目") + doc = Document(output_path) + from health_content_generator import fill_abnormal_index_summary, generate_item_explanations + + # 获取异常项目的临床意义解释(优先使用模板解释) + item_explanations = generate_item_explanations(abnormal_items, deepseek_api_key, call_deepseek_api if use_deepseek else None) + + fill_abnormal_index_summary(doc, abnormal_items, item_explanations) + # 使用安全保存 + from xml_safe_save import safe_save + safe_save(doc, output_path, word_template_path) + else: + print(" 没有异常项目") + + print("\n" + "=" * 60) + print("步骤6: 清理空白数据行") + print("=" * 60) + clean_empty_rows(output_path, output_path, patient_info) + + print("\n" + "=" * 60) + print("步骤7: 格式整理(表格间空行 + 模块间分页符)") + print("=" * 60) + format_document_structure(output_path, output_path) + + # 步骤8: 修复保护区域 + print("\n" + "=" * 60) + print("步骤8: 修复保护区域(前四页)") + print("=" * 60) + print(" 策略: 从原始模板复制前四页,保留所有图片和布局") + copy_protected_region_from_template(word_template_path, output_path, protection_boundary) + + # 步骤8.5: 填充患者基本信息 + print("\n" + "=" * 60) + print("步骤8.5: 填充患者基本信息") + print("=" * 60) + if patient_info and any(patient_info.values()): + doc = Document(output_path) + fill_patient_info_in_template(doc, patient_info) + doc.save(output_path) + print(f" ✓ 患者信息已填充") + else: + print(" ⚠️ 未提取到患者信息,跳过填充") + + # 步骤9: 根据异常项生成健康评估和建议内容(可选) + # 注意:必须在步骤8之后执行,因为步骤8会从模板复制前四页 + if use_deepseek and deepseek_api_key: + print("\n" + "=" * 60) + print("步骤9: 生成健康评估与建议内容") + print("=" * 60) + doc = Document(output_path) + from health_content_generator import generate_and_fill_health_content as gen_health + gen_health(doc, matched_data, deepseek_api_key, call_deepseek_api) + # 直接保存,不使用safe_save(避免覆盖分页符) + doc.save(output_path) + print(f" ✓ 健康内容已保存") + + # 步骤10: 修复页脚(确保所有页面都有 Be.U Med logo) + print("\n" + "=" * 60) + print("步骤10: 修复页脚") + print("=" * 60) + fix_footer_reference(word_template_path, output_path) + + print("\n" + "=" * 60) + print("✅ 全部完成!") + print(f"✅ 输出文件: {output_path}") + print("=" * 60) + +if __name__ == '__main__': + import os + + force = '--force' in sys.argv or '-f' in sys.argv + # 默认启用 DeepSeek 分析 + use_deepseek = '--no-deepseek' not in sys.argv + + # 获取DeepSeek API Key(优先使用代码中的默认值,其次环境变量,最后命令行参数) + deepseek_key = DEEPSEEK_API_KEY or os.environ.get('DEEPSEEK_API_KEY', '') + for i, arg in enumerate(sys.argv): + if arg in ['--api-key', '-k'] and i + 1 < len(sys.argv): + deepseek_key = sys.argv[i + 1] + break + + if use_deepseek and not deepseek_key: + print("⚠️ 使用DeepSeek需要提供API Key") + print(" 方法1: 在代码中设置 DEEPSEEK_API_KEY") + print(" 方法2: 设置环境变量 DEEPSEEK_API_KEY") + print(" 方法3: 使用参数 --api-key YOUR_KEY") + sys.exit(1) + + print("=" * 60) + print(" 医疗报告智能提取与填充系统") + print("=" * 60) + print(f" OCR提取: 百度高精度OCR") + print(f" 智能分析: {'DeepSeek ✓' if use_deepseek else '关闭'}") + print(f" 强制刷新: {'是' if force else '否'}") + print("=" * 60) + + main(force_extract=force, use_deepseek=use_deepseek, deepseek_api_key=deepseek_key) diff --git a/backend/extract_template_explanations.py b/backend/extract_template_explanations.py new file mode 100644 index 0000000..8332ba6 --- /dev/null +++ b/backend/extract_template_explanations.py @@ -0,0 +1,140 @@ +""" +从模板文件中提取所有检测项目的临床意义 +""" +from docx import Document +import json +import re + +def extract_all_explanations(): + doc = Document('template_complete.docx') + + explanations = {} + + # 遍历所有表格 + for table_idx, table in enumerate(doc.tables): + rows = table.rows + if len(rows) < 2: + continue + + # 检查是否是检测项目表格(通过表头判断) + header_text = ' '.join([cell.text.strip() for cell in rows[0].cells]) + + # 遍历每一行 + current_abb = None + for row_idx, row in enumerate(rows): + cells = row.cells + if not cells: + continue + + # 获取第一列文本(通常是ABB) + first_cell_text = cells[0].text.strip() + + # 跳过表头行 + if 'Abb' in first_cell_text or '简称' in first_cell_text: + continue + + # 检查是否是ABB行(短文本,不是临床意义) + if first_cell_text and len(first_cell_text) < 40: + if not first_cell_text.startswith('Clinical') and not first_cell_text.startswith('临床'): + # 可能是ABB + current_abb = first_cell_text + + # 查找临床意义 + for cell in cells: + text = cell.text.strip() + if 'Clinical Significance:' in text and '临床意义:' in text: + # 提取英文和中文 + parts = text.split('临床意义:') + if len(parts) == 2: + en = parts[0].replace('Clinical Significance:', '').strip() + cn = parts[1].strip() + + if current_abb and en and cn: + # 标准化ABB名称 + abb_key = current_abb.upper().strip() + # 处理特殊字符 + abb_key = abb_key.replace(' - ', '-').replace('(', '(').replace(')', ')') + + if abb_key not in explanations: + explanations[abb_key] = { + 'clinical_en': en, + 'clinical_cn': cn + } + print(f'提取: {abb_key}') + + return explanations + +def main(): + print('从模板提取临床意义...') + print('=' * 60) + + template_explanations = extract_all_explanations() + print(f'\n从模板提取了 {len(template_explanations)} 个项目') + + # 读取现有文件 + try: + with open('template_explanations.json', 'r', encoding='utf-8') as f: + existing = json.load(f) + print(f'现有文件中有 {len(existing)} 个项目') + except: + existing = {} + print('创建新文件') + + # 用模板内容更新(模板优先) + updated_count = 0 + for abb, exp in template_explanations.items(): + if abb not in existing or existing[abb] != exp: + existing[abb] = exp + updated_count += 1 + + # 保存 + with open('template_explanations.json', 'w', encoding='utf-8') as f: + json.dump(existing, f, ensure_ascii=False, indent=2) + + print(f'\n更新了 {updated_count} 个项目') + print(f'最终文件包含 {len(existing)} 个项目') + + # 检查配置文件中的项目是否都有临床意义 + print('\n' + '=' * 60) + print('检查配置文件中的项目覆盖情况...') + + with open('abb_mapping_config.json', 'r', encoding='utf-8') as f: + config = json.load(f) + + config_abbs = set() + for module_name, module_data in config.get('modules', {}).items(): + for item in module_data.get('items', []): + abb = item.get('abb', '').upper().strip() + abb = abb.replace(' - ', '-').replace('(', '(').replace(')', ')') + config_abbs.add(abb) + + # 检查缺失 + missing = [] + for abb in config_abbs: + if abb not in existing: + # 尝试一些变体 + found = False + variants = [ + abb, + abb.replace('-', ' '), + abb.replace(' ', '-'), + abb.replace('%', ''), + abb + ' COUNT', + abb + ' TYPE', + ] + for v in variants: + if v in existing: + found = True + break + if not found: + missing.append(abb) + + if missing: + print(f'\n缺失临床意义的项目 ({len(missing)}):') + for abb in sorted(missing): + print(f' {abb}') + else: + print('\n所有配置项目都有临床意义!') + +if __name__ == '__main__': + main() diff --git a/backend/extracted_medical_data.json b/backend/extracted_medical_data.json new file mode 100644 index 0000000..075ee54 --- /dev/null +++ b/backend/extracted_medical_data.json @@ -0,0 +1,989 @@ +{ + "total_items": 108, + "items": [ + { + "abb": "ALB", + "project": "白蛋白", + "result": "10", + "point": "", + "unit": "mg/L", + "reference": "<=20", + "source": "1125041700091(1).pdf" + }, + { + "abb": "pH", + "project": "酸碱度", + "result": "6.5", + "point": "", + "unit": "", + "reference": "4.5-8", + "source": "1125041700091(1).pdf" + }, + { + "abb": "WBC", + "project": "白细胞计数(WBC)", + "result": "5.1", + "point": "", + "unit": "x10^9/L", + "reference": "3.5-9.5", + "source": "1125041700091(1).pdf" + }, + { + "abb": "NEUT%", + "project": "中性粒细胞百分率(NEUT%)", + "result": "43.9", + "point": "", + "unit": "%", + "reference": "40-75", + "source": "1125041700091(1).pdf" + }, + { + "abb": "LYMPH%", + "project": "淋巴细胞百分率(LYMPH%)", + "result": "45.7", + "point": "", + "unit": "%", + "reference": "20-50", + "source": "1125041700091(1).pdf" + }, + { + "abb": "MONO%", + "project": "单核细胞百分率(MONO%)", + "result": "7.5", + "point": "", + "unit": "%", + "reference": "3-10", + "source": "1125041700091(1).pdf" + }, + { + "abb": "EOS%", + "project": "嗜酸性粒细胞百分率(EO%)", + "result": "2.3", + "point": "", + "unit": "%", + "reference": "0.4-8", + "source": "1125041700091(1).pdf" + }, + { + "abb": "BAS%", + "project": "嗜碱性粒细胞百分率(BASO%)", + "result": "0.6", + "point": "", + "unit": "%", + "reference": "<=1", + "source": "1125041700091(1).pdf" + }, + { + "abb": "NEUT", + "project": "中性粒细胞数(NEUT#)", + "result": "2.3", + "point": "", + "unit": "x10^9/L", + "reference": "1.8-6.3", + "source": "1125041700091(1).pdf" + }, + { + "abb": "LYMPH", + "project": "淋巴细胞数(LYMPH#)", + "result": "2.4", + "point": "", + "unit": "x10^9/L", + "reference": "1.1-3.2", + "source": "1125041700091(1).pdf" + }, + { + "abb": "MONO", + "project": "单核细胞数(MONO#)", + "result": "0.39", + "point": "", + "unit": "x10^9/L", + "reference": "0.1-0.6", + "source": "1125041700091(1).pdf" + }, + { + "abb": "EOS", + "project": "嗜酸性粒细胞数(EO#)", + "result": "0.12", + "point": "", + "unit": "x10^9/L", + "reference": "0.02-0.52", + "source": "1125041700091(1).pdf" + }, + { + "abb": "BAS", + "project": "嗜碱性粒细胞数(BASO#)", + "result": "0.03", + "point": "", + "unit": "x10^9/L", + "reference": "<=0.06", + "source": "1125041700091(1).pdf" + }, + { + "abb": "RBC", + "project": "红细胞计数(RBC)", + "result": "3.77", + "point": "↓", + "unit": "x10^12/L", + "reference": "4.3-5.8", + "source": "1125041700091(1).pdf" + }, + { + "abb": "Hb", + "project": "血红蛋白量(HGB)", + "result": "123", + "point": "↓", + "unit": "g/L", + "reference": "130-175", + "source": "1125041700091(1).pdf" + }, + { + "abb": "HCT", + "project": "红细胞比积(HCT)", + "result": "38", + "point": "↓", + "unit": "%", + "reference": "40-50", + "source": "1125041700091(1).pdf" + }, + { + "abb": "MCV", + "project": "平均红细胞体积(MCV)", + "result": "100", + "point": "", + "unit": "fL", + "reference": "82-100", + "source": "1125041700091(1).pdf" + }, + { + "abb": "MCH", + "project": "平均红细胞血红蛋白量(MCH)", + "result": "33", + "point": "", + "unit": "pg", + "reference": "27-34", + "source": "1125041700091(1).pdf" + }, + { + "abb": "MCHC", + "project": "平均红细胞血红蛋白浓度(MCHC)", + "result": "326", + "point": "", + "unit": "g/L", + "reference": "316-354", + "source": "1125041700091(1).pdf" + }, + { + "abb": "RDW-SD", + "project": "红细胞分布宽度-标准差(RDW-SD)", + "result": "45", + "point": "", + "unit": "fL", + "reference": "37-50", + "source": "1125041700091(1).pdf" + }, + { + "abb": "RDW", + "project": "红细胞分布宽度-变异系数(RDW-CV)", + "result": "12.0", + "point": "", + "unit": "%", + "reference": "11.6-14.4", + "source": "1125041700091(1).pdf" + }, + { + "abb": "PLT", + "project": "血小板计数(PLT)", + "result": "163", + "point": "", + "unit": "x10^9/L", + "reference": "125-350", + "source": "1125041700091(1).pdf" + }, + { + "abb": "PCT", + "project": "血小板比积(PCT)", + "result": "0.18", + "point": "", + "unit": "%", + "reference": "0.17-0.35", + "source": "1125041700091(1).pdf" + }, + { + "abb": "MPV", + "project": "平均血小板体积(MPV)", + "result": "10.9", + "point": "", + "unit": "fL", + "reference": "9-13", + "source": "1125041700091(1).pdf" + }, + { + "abb": "PDW", + "project": "血小板分布宽度(PDW)", + "result": "16.0", + "point": "", + "unit": "fL", + "reference": "9-17", + "source": "1125041700091(1).pdf" + }, + { + "abb": "P-LCR", + "project": "大型血小板比率(P-LCR)", + "result": "31", + "point": "", + "unit": "%", + "reference": "13-43", + "source": "1125041700091(1).pdf" + }, + { + "abb": "TBil", + "project": "总胆红素", + "result": "8.3", + "point": "", + "unit": "umol/L", + "reference": "3-26", + "source": "1125041700091(1).pdf" + }, + { + "abb": "DBil", + "project": "直接胆红素", + "result": "1.7", + "point": "", + "unit": "umol/L", + "reference": "<=7", + "source": "1125041700091(1).pdf" + }, + { + "abb": "IBil", + "project": "间接胆红素", + "result": "6.6", + "point": "", + "unit": "umol/L", + "reference": "1.7-17", + "source": "1125041700091(1).pdf" + }, + { + "abb": "TP", + "project": "总蛋白", + "result": "72.4", + "point": "", + "unit": "g/L", + "reference": "65-85", + "source": "1125041700091(1).pdf" + }, + { + "abb": "ALB", + "project": "白蛋白", + "result": "44.8", + "point": "", + "unit": "g/L", + "reference": "40-55", + "source": "1125041700091(1).pdf" + }, + { + "abb": "GLB", + "project": "球蛋白", + "result": "27.6", + "point": "", + "unit": "g/L", + "reference": "20-40", + "source": "1125041700091(1).pdf" + }, + { + "abb": "A/G", + "project": "白球比值", + "result": "1.6", + "point": "", + "unit": "", + "reference": "1.2-2.4", + "source": "1125041700091(1).pdf" + }, + { + "abb": "CHE", + "project": "胆碱酯酶", + "result": "290", + "point": "", + "unit": "U/L", + "reference": "203-460", + "source": "1125041700091(1).pdf" + }, + { + "abb": "ALT", + "project": "谷丙转氨酶", + "result": "16", + "point": "", + "unit": "U/L", + "reference": "9-50", + "source": "1125041700091(1).pdf" + }, + { + "abb": "AST", + "project": "谷草转氨酶", + "result": "25", + "point": "", + "unit": "U/L", + "reference": "15-40", + "source": "1125041700091(1).pdf" + }, + { + "abb": "GGT", + "project": "γ-谷氨酰基转移酶", + "result": "22", + "point": "", + "unit": "U/L", + "reference": "10-60", + "source": "1125041700091(1).pdf" + }, + { + "abb": "ALP", + "project": "碱性磷酸酶", + "result": "61", + "point": "", + "unit": "U/L", + "reference": "45-125", + "source": "1125041700091(1).pdf" + }, + { + "abb": "Tf", + "project": "转铁蛋白", + "result": "2.43", + "point": "", + "unit": "g/L", + "reference": "2.00-3.60", + "source": "1125041700091(1).pdf" + }, + { + "abb": "Tf", + "project": "转铁蛋白", + "result": "43.57", + "point": "", + "unit": "mg/L", + "reference": "25.80-65.70", + "source": "1125041700091(1).pdf" + }, + { + "abb": "CysC", + "project": "胱抑素C", + "result": "0.90", + "point": "", + "unit": "mg/L", + "reference": "0.55-1.05", + "source": "1125041700091(1).pdf" + }, + { + "abb": "β2-MG", + "project": "血清β2微球蛋白", + "result": "1.8", + "point": "", + "unit": "mg/L", + "reference": "1.0-2.3", + "source": "1125041700091(1).pdf" + }, + { + "abb": "ALB", + "project": "白蛋白", + "result": "1.0", + "point": "", + "unit": "mg/L", + "reference": "<=20", + "source": "1125041700091(1).pdf" + }, + { + "abb": "GLB", + "project": "球蛋白", + "result": "70", + "point": "", + "unit": "ug/L", + "reference": "<=300", + "source": "1125041700091(1).pdf" + }, + { + "abb": "TG", + "project": "甘油三酯", + "result": "1.37", + "point": "", + "unit": "mmol/L", + "reference": "0.45-1.69", + "source": "1125041700091(1).pdf" + }, + { + "abb": "TC", + "project": "总胆固醇", + "result": "4.67", + "point": "", + "unit": "mmol/L", + "reference": "2.33-5.17", + "source": "1125041700091(1).pdf" + }, + { + "abb": "HDL", + "project": "高密度脂蛋白胆固醇", + "result": "1.52", + "point": "", + "unit": "mmol/L", + "reference": "0.91-2.06", + "source": "1125041700091(1).pdf" + }, + { + "abb": "LDL", + "project": "低密度脂蛋白胆固醇", + "result": "2.50", + "point": "", + "unit": "mmol/L", + "reference": "2.07-3.36", + "source": "1125041700091(1).pdf" + }, + { + "abb": "FFA", + "project": "游离脂肪酸", + "result": "0.66", + "point": "", + "unit": "mmol/L", + "reference": "0.1-0.77", + "source": "1125041700091(1).pdf" + }, + { + "abb": "INS", + "project": "胰岛素(空腹)", + "result": "8.3", + "point": "", + "unit": "μU/ml", + "reference": "2.6-24.9", + "source": "1125041700091(1).pdf" + }, + { + "abb": "FBS", + "project": "葡萄糖(空腹)", + "result": "5.41", + "point": "", + "unit": "mmol/L", + "reference": "3.9-6.1", + "source": "1125041700091(1).pdf" + }, + { + "abb": "HbA1C", + "project": "糖化血红蛋白", + "result": "5.6", + "point": "", + "unit": "%", + "reference": "4.0-6.5", + "source": "1125041700091(1).pdf" + }, + { + "abb": "CK", + "project": "肌酸激酶", + "result": "136", + "point": "", + "unit": "U/L", + "reference": "50-310", + "source": "1125041700091(1).pdf" + }, + { + "abb": "CK-MB", + "project": "肌酸激酶同工酶MB", + "result": "11", + "point": "", + "unit": "U/L", + "reference": "<=24", + "source": "1125041700091(1).pdf" + }, + { + "abb": "hs-CRP", + "project": "超敏C反应蛋白", + "result": "0.5", + "point": "", + "unit": "mg/L", + "reference": "<=3.0", + "source": "1125041700091(1).pdf" + }, + { + "abb": "Hcy", + "project": "同型半胱氨酸", + "result": "9.7", + "point": "", + "unit": "umol/L", + "reference": "<=15", + "source": "1125041700091(1).pdf" + }, + { + "abb": "Lp(a)", + "project": "脂蛋白(a)", + "result": "26", + "point": "", + "unit": "mg/L", + "reference": "<=300", + "source": "1125041700091(1).pdf" + }, + { + "abb": "Tg", + "project": "甲状腺球蛋白", + "result": "8.8", + "point": "", + "unit": "ng/ml", + "reference": "3.5-77.0", + "source": "1125041700091(1).pdf" + }, + { + "abb": "T3", + "project": "三碘甲状腺原氨酸T3", + "result": "1.31", + "point": "", + "unit": "nmol/L", + "reference": "1.3-2.4", + "source": "1125041700091(1).pdf" + }, + { + "abb": "T4", + "project": "甲状腺素T4", + "result": "99.0", + "point": "", + "unit": "nmol/L", + "reference": "70-140", + "source": "1125041700091(1).pdf" + }, + { + "abb": "FT3", + "project": "游离三碘甲状腺原氨酸FT3", + "result": "4.35", + "point": "", + "unit": "pmol/L", + "reference": "3.82-6.30", + "source": "1125041700091(1).pdf" + }, + { + "abb": "FT4", + "project": "游离甲状腺素FT4", + "result": "14.30", + "point": "", + "unit": "pmol/L", + "reference": "12.80-21.30", + "source": "1125041700091(1).pdf" + }, + { + "abb": "TSH", + "project": "促甲状腺素TSH", + "result": "1.55", + "point": "", + "unit": "mIU/L", + "reference": "0.75-5.60", + "source": "1125041700091(1).pdf" + }, + { + "abb": "TgAb", + "project": "抗甲状腺球蛋白抗体", + "result": "16.5", + "point": "", + "unit": "IU/ml", + "reference": "<115.0", + "source": "1125041700091(1).pdf" + }, + { + "abb": "TPO-Ab", + "project": "抗甲状腺过氧化物酶抗体", + "result": "13.1", + "point": "", + "unit": "IU/ml", + "reference": "<34.0", + "source": "1125041700091(1).pdf" + }, + { + "abb": "PGI", + "project": "胃蛋白酶原I", + "result": "98.4", + "point": "", + "unit": "ng/ml", + "reference": ">=30", + "source": "1125041700091(1).pdf" + }, + { + "abb": "G-17", + "project": "胃泌素-17", + "result": "2.9", + "point": "", + "unit": "pmol/L", + "reference": "1.7-7.6", + "source": "1125041700091(1).pdf" + }, + { + "abb": "PGII", + "project": "胃蛋白酶原Ⅱ", + "result": "11.1", + "point": "", + "unit": "ng/ml", + "reference": "", + "source": "1125041700091(1).pdf" + }, + { + "abb": "PGR", + "project": "胃蛋白酶原比值", + "result": "8.9", + "point": "", + "unit": "", + "reference": ">=3", + "source": "1125041700091(1).pdf" + }, + { + "abb": "HBsAg", + "project": "乙肝表面抗原", + "result": "0.87", + "point": "", + "unit": "COI", + "reference": "<1.0", + "source": "1125041700091(1).pdf" + }, + { + "abb": "HBsAb", + "project": "乙肝表面抗体", + "result": "<2.00", + "point": "", + "unit": "IU/L", + "reference": "<10.0", + "source": "1125041700091(1).pdf" + }, + { + "abb": "HBeAg", + "project": "乙肝e抗原", + "result": "0.10", + "point": "", + "unit": "COI", + "reference": "<1.0", + "source": "1125041700091(1).pdf" + }, + { + "abb": "HBeAb", + "project": "乙肝e抗体", + "result": "1.40", + "point": "", + "unit": "COI", + "reference": ">1.0", + "source": "1125041700091(1).pdf" + }, + { + "abb": "HBcAb", + "project": "乙肝核心抗体", + "result": "0.01", + "point": "", + "unit": "COI", + "reference": ">1.0", + "source": "1125041700091(1).pdf" + }, + { + "abb": "HBcAb", + "project": "乙肝核心抗体", + "result": "阳性", + "point": "", + "unit": "", + "reference": "", + "source": "1125041700091(1).pdf" + }, + { + "abb": "CRP", + "project": "C反应蛋白", + "result": "0.5", + "point": "", + "unit": "mg/L", + "reference": "<=6.0", + "source": "1125041700091(1).pdf" + }, + { + "abb": "ASO", + "project": "抗链球菌溶血素\"0\"", + "result": "32", + "point": "", + "unit": "IU/ml", + "reference": "<=160", + "source": "1125041700091(1).pdf" + }, + { + "abb": "ANA", + "project": "抗核抗体", + "result": "0.9", + "point": "", + "unit": "AU/ml", + "reference": "<40.0", + "source": "1125041700091(1).pdf" + }, + { + "abb": "RF", + "project": "类风湿因子", + "result": "5", + "point": "", + "unit": "IU/ml", + "reference": "<=20", + "source": "1125041700091(1).pdf" + }, + { + "abb": "PTH", + "project": "甲状旁腺素", + "result": "5.9", + "point": "", + "unit": "pmol/L", + "reference": "1.6-6.9", + "source": "1125041700091(1).pdf" + }, + { + "abb": "OST", + "project": "骨钙素", + "result": "15.4", + "point": "", + "unit": "ng/ml", + "reference": "5.58-28.62", + "source": "1125041700091(1).pdf" + }, + { + "abb": "VitB12", + "project": "维生素B12", + "result": "497", + "point": "", + "unit": "pmol/L", + "reference": "145-569", + "source": "1125041700091(1).pdf" + }, + { + "abb": "Fer", + "project": "血清铁蛋白", + "result": "86", + "point": "", + "unit": "ng/ml", + "reference": "31.3-408.5", + "source": "1125041700091(1).pdf" + }, + { + "abb": "Folate", + "project": "维生素B9(叶酸)血药浓度测定", + "result": "12.18", + "point": "", + "unit": "ng/ml", + "reference": ">4", + "source": "1125041700091(1).pdf" + }, + { + "abb": "25-OH-VD2+D3", + "project": "25-羟基维生素D血药浓度测定", + "result": "19.91", + "point": "↓", + "unit": "ng/ml", + "reference": "30-100", + "source": "1125041700091(1).pdf" + }, + { + "abb": "VitA", + "project": "维生素A血药浓度测定", + "result": "564.54", + "point": "", + "unit": "ng/ml", + "reference": "325-780", + "source": "1125041700091(1).pdf" + }, + { + "abb": "VD3", + "project": "25-羟基维生素D3血药浓度测定", + "result": "19.63", + "point": "", + "unit": "ng/ml", + "reference": "无参考范围", + "source": "1125041700091(1).pdf" + }, + { + "abb": "VD2", + "project": "25-羟基维生素D2血药浓度测定", + "result": "0.28", + "point": "", + "unit": "ng/ml", + "reference": "无参考范围", + "source": "1125041700091(1).pdf" + }, + { + "abb": "VitE", + "project": "维生素E血药浓度测定", + "result": "9.29", + "point": "", + "unit": "ug/ml", + "reference": "5--18", + "source": "1125041700091(1).pdf" + }, + { + "abb": "VitB2", + "project": "维生素B2血药浓度测定", + "result": "7.90", + "point": "", + "unit": "ng/ml", + "reference": "2.33-14.69", + "source": "1125041700091(1).pdf" + }, + { + "abb": "VitB1", + "project": "维生素B1血药浓度测定", + "result": "1.67", + "point": "↓", + "unit": "ng/ml", + "reference": "2.4-9.02", + "source": "1125041700091(1).pdf" + }, + { + "abb": "VitB5", + "project": "维生素B5血药浓度测定", + "result": "50.25", + "point": "", + "unit": "ng/ml", + "reference": "12.9-253.1", + "source": "1125041700091(1).pdf" + }, + { + "abb": "VitB3", + "project": "维生素B3血药浓度测定", + "result": "29.62", + "point": "", + "unit": "ng/ml", + "reference": "5.2-72.1", + "source": "1125041700091(1).pdf" + }, + { + "abb": "VitB6", + "project": "维生素B6血药浓度测定", + "result": "4.67", + "point": "↓", + "unit": "ng/ml", + "reference": "4.9-30.9", + "source": "1125041700091(1).pdf" + }, + { + "abb": "AFP", + "project": "甲胎蛋白", + "result": "0.5", + "point": "", + "unit": "ng/ml", + "reference": "<=7.0", + "source": "1125041700091(1).pdf" + }, + { + "abb": "CEA", + "project": "癌胚抗原", + "result": "1.3", + "point": "", + "unit": "ng/ml", + "reference": "<=5", + "source": "1125041700091(1).pdf" + }, + { + "abb": "CA19-9", + "project": "糖类抗原19-9", + "result": "9.9", + "point": "", + "unit": "U/ml", + "reference": "<=30", + "source": "1125041700091(1).pdf" + }, + { + "abb": "CA72-4", + "project": "糖类抗原72-4", + "result": "2.6", + "point": "", + "unit": "U/ml", + "reference": "<6.9", + "source": "1125041700091(1).pdf" + }, + { + "abb": "CA24-2", + "project": "糖类抗原24-2", + "result": "7.3", + "point": "", + "unit": "U/ml", + "reference": "<20.0", + "source": "1125041700091(1).pdf" + }, + { + "abb": "CA50", + "project": "糖类抗原50", + "result": "8.0", + "point": "", + "unit": "U/ml", + "reference": "<25.0", + "source": "1125041700091(1).pdf" + }, + { + "abb": "CA125", + "project": "糖类抗原125", + "result": "4.9", + "point": "", + "unit": "U/ml", + "reference": "<=24", + "source": "1125041700091(1).pdf" + }, + { + "abb": "NSE", + "project": "神经元特异性烯醇化酶", + "result": "2.6", + "point": "", + "unit": "ng/ml", + "reference": "<16.3", + "source": "1125041700091(1).pdf" + }, + { + "abb": "CYFRA21-1", + "project": "细胞角蛋白19片段", + "result": "1.6", + "point": "", + "unit": "ng/ml", + "reference": "<3.3", + "source": "1125041700091(1).pdf" + }, + { + "abb": "ProGRP", + "project": "胃泌素释放肽前体", + "result": "30.0", + "point": "", + "unit": "pg/ml", + "reference": "28.3-74.4", + "source": "1125041700091(1).pdf" + }, + { + "abb": "SCC", + "project": "鳞状细胞癌相关抗原", + "result": "2.32", + "point": "", + "unit": "ng/ml", + "reference": "<=2.7", + "source": "1125041700091(1).pdf" + }, + { + "abb": "TPSA", + "project": "总前列腺特异抗原", + "result": "0.48", + "point": "", + "unit": "ng/ml", + "reference": "<=4", + "source": "1125041700091(1).pdf" + }, + { + "abb": "FPSA", + "project": "游离前列腺特异抗原", + "result": "0.25", + "point": "", + "unit": "ng/ml", + "reference": "<=0.93", + "source": "1125041700091(1).pdf" + }, + { + "abb": "F/TPSA", + "project": "游离PSA/总PSA", + "result": "0.52", + "point": "", + "unit": "", + "reference": "0.25-1", + "source": "1125041700091(1).pdf" + } + ], + "pdf_files": { + "1125041700091(1).pdf": 1770859127.7919312 + }, + "patient_info": { + "name": "姚友胜", + "gender": "男性", + "age": "59", + "nation": "中国", + "exam_time": "2012-50-41", + "project": "功能医学检测套餐", + "report_time": "2026-02-12" + } +} \ No newline at end of file diff --git a/backend/fill_with_docxtpl.py b/backend/fill_with_docxtpl.py new file mode 100644 index 0000000..669c396 --- /dev/null +++ b/backend/fill_with_docxtpl.py @@ -0,0 +1,1044 @@ +""" +使用docxtpl填充Word模板 +""" +from docxtpl import DocxTemplate +import json +from pathlib import Path + + +def clean_extracted_data(items: list) -> list: + """清理提取的数据,分离单位和参考范围,过滤无效数据""" + import re + + cleaned = [] + + for item in items: + result = item.get('result', '') + unit = item.get('unit', '') + reference = item.get('reference', '') + project = item.get('project', '') + + # 跳过无效数据 + if result in ['.', ':', '-', '/', '', None]: + # 检查unit中是否有实际结果(如 "Yellow [Normal...]") + if unit: + # 提取unit开头的结果值 + result_in_unit = re.match(r'^([A-Za-z]+)\s*\[', unit) + if result_in_unit: + item['result'] = result_in_unit.group(1) + unit = re.sub(r'^[A-Za-z]+\s*', '', unit) + else: + continue # 跳过无效数据 + else: + continue + + # 跳过明显错误的project(如包含Phase、antibody等) + if any(kw in project.lower() for kw in ['phase', 'antibody', 'treponema']): + # 这些可能是OCR错误识别的行 + abb = item.get('abb', '').upper() + if abb in ['PH', 'CU', 'CL', 'CA']: # 这些ABB容易被误匹配 + continue + + # 如果unit包含[Normal...]或(...)范围信息,分离出来 + if unit: + # 匹配 [Normal : xxx] 或 [正常 : xxx] + normal_match = re.search(r'\[Normal\s*[::]\s*([^\]]+)\]', unit, re.IGNORECASE) + if normal_match: + if not reference: + item['reference'] = normal_match.group(1).strip() + unit = re.sub(r'\[Normal\s*[::][^\]]+\]', '', unit, flags=re.IGNORECASE).strip() + + # 匹配 (xxx-xxx) 范围 + range_match = re.search(r'\([\d\.\-<>]+\)', unit) + if range_match and not reference: + item['reference'] = range_match.group(0) + unit = re.sub(r'\([\d\.\-<>]+\)', '', unit).strip() + + # 清理开头的数字(可能是错误解析) + unit = re.sub(r'^-?\d+\s*', '', unit).strip() + + item['unit'] = unit + + cleaned.append(item) + + return cleaned + + +def select_best_match(items_by_abb: dict) -> dict: + """当同一ABB有多个条目时,选择最佳的一个""" + import re + + best = {} + for abb, items in items_by_abb.items(): + if len(items) == 1: + best[abb] = items[0] + else: + # 选择有有效数值结果的 + scored = [] + for item in items: + score = 0 + result = item.get('result', '') + + # 有数值结果加分 + if re.search(r'\d+\.?\d*', result): + score += 10 + + # 有参考范围加分 + if item.get('reference'): + score += 5 + + # 有单位加分 + if item.get('unit') and len(item.get('unit', '')) < 20: + score += 3 + + # 定性结果(Negative/Positive等)也有效 + if result.lower() in ['negative', 'positive', 'normal', 'reactive', 'non-reactive']: + score += 8 + + scored.append((score, item)) + + # 选择得分最高的 + scored.sort(key=lambda x: x[0], reverse=True) + best[abb] = scored[0][1] + + return best + + +def build_context(matched_data: dict) -> dict: + """ + 将匹配数据转换为docxtpl上下文格式 + + Args: + matched_data: {ABB: {result, unit, reference, point}} + + Returns: + docxtpl context dict + """ + import re + context = {} + + # 模块映射(根据project名称和ABB推断模块) + def get_module(abb, project, result): + abb_upper = abb.upper() + project_lower = project.lower() + result_lower = result.lower() if result else '' + + # 尿检特有项目 + urine_projects = ['color', 'specific gravity', 'protein', 'glucose', 'ketone', + 'nitrite', 'turbidity', '颜色', '比重', '蛋白', '糖', '酮体', '亚硝酸'] + if any(kw in project_lower for kw in urine_projects): + return 'URINE' + + # 尿检WBC特征:project是"WBC"且result是小数字或Negative/Positive + if abb_upper == 'WBC' and project_lower == 'wbc': + return 'URINE' + if abb_upper == 'WBC' and 'total' in project_lower: + return 'CBC' + + # pH在尿检中 + if abb_upper == 'PH' and 'ph' in project_lower and len(project) < 20: + return 'URINE' + + # 定性结果通常是尿检 + if abb_upper in ['PRO', 'GLU', 'KET', 'NIT', 'BLD'] and result_lower in ['negative', 'positive', 'trace']: + return 'URINE' + + return '' + + # 重复ABB列表 + duplicate_abbs = ['PRO', 'WBC', 'COLOR', 'PH', 'GLU', 'SG', 'NIT', 'KET', 'BLD', 'ERY'] + + # ABB别名映射:提取数据ABB -> 模板变量名格式 + # 解决如 CA153 vs CA15_3、CA199 vs CA19_9 的格式差异 + abb_aliases = { + 'CA153': 'CA15_3', + 'CA199': 'CA19_9', + 'ABO': 'BLOODTYPE', # ABO血型 -> BLOODTYPE + 'RH': 'BLOODTYPERH', # Rh血型 -> BLOODTYPERH + 'CKMB': 'CK_MB', # 心肌酶 + } + + for abb, data in matched_data.items(): + # 标准化变量名(只保留字母数字下划线) + var_name = abb.replace('-', '_').replace('/', '_').replace('%', 'pct') + var_name = re.sub(r'[^a-zA-Z0-9_]', '', var_name) + + # 检查是否有别名映射 + abb_upper = abb.upper() + if abb_upper in abb_aliases: + alias_var = abb_aliases[abb_upper] + # 同时生成别名格式的变量 + context[f"{alias_var}_result"] = data.get('result', '') + context[f"{alias_var}_point"] = data.get('point', '') + context[f"{alias_var}_refer"] = data.get('reference', '') + context[f"{alias_var}_unit"] = data.get('unit', '') + if not var_name or var_name[0].isdigit(): + var_name = 'V_' + var_name + + # 对于重复ABB,根据project推断模块并添加前缀 + if abb.upper() in duplicate_abbs: + module = get_module(abb, data.get('project', ''), data.get('result', '')) + if module: + var_name_with_module = f"{module}_{var_name}" + context[f"{var_name_with_module}_result"] = data.get('result', '') + context[f"{var_name_with_module}_point"] = data.get('point', '') + context[f"{var_name_with_module}_refer"] = data.get('reference', '') + context[f"{var_name_with_module}_unit"] = data.get('unit', '') + + # 同时保留不带前缀的(兼容) + context[f"{var_name}_result"] = data.get('result', '') + context[f"{var_name}_point"] = data.get('point', '') + context[f"{var_name}_refer"] = data.get('reference', '') + context[f"{var_name}_unit"] = data.get('unit', '') + + return context + + +def fill_template(template_path: str, matched_data: dict, output_path: str): + """ + 使用docxtpl填充模板 + + Args: + template_path: docxtpl格式的模板路径 + matched_data: 匹配的数据 + output_path: 输出文件路径 + """ + doc = DocxTemplate(template_path) + + # 构建上下文 + context = build_context(matched_data) + + print(f"准备填充 {len(context)} 个变量") + + # 渲染 + doc.render(context) + + # 保存 + doc.save(output_path) + print(f"[OK] 已保存到: {output_path}") + + return doc + + +def clean_empty_rows(doc_path: str, output_path: str): + """清理空白数据行,合并表格""" + from docx import Document + from docx.text.paragraph import Paragraph as EarlyPara + import re + import copy + + doc = Document(doc_path) + + # === 首先删除"异常指标汇总"区域的所有表格 === + # 这些表格在第一个检测模块之前,不应该存在 + body_early = doc._body._body + children_early = list(body_early) + + # 检测模块关键词(必须精确匹配检测模块标题) + detection_kw = ['urine detection', '尿液检测', 'complete blood count', '血常规', + 'blood sugar', '血糖', 'blood lipid', '血脂', 'liver function', '肝功能', + 'kidney function', '肾功能', 'thyroid', '甲状腺', 'coagulation', '凝血', + 'infectious', '传染病', 'electrolyte', '电解质'] + exclude_kw = ['health program', '健康方案', 'health report', '健康报告', + 'abnormal', '异常', 'overall', '整体', 'assessment', '评估', + 'blood glucose', 'hematology', 'hormonal', 'immunology', 'nutrition'] + + # 找第一个检测模块位置(查找精确的模块标题) + first_module_idx = len(children_early) + for idx, elem in enumerate(children_early): + if elem.tag.endswith('}p'): + try: + p = EarlyPara(elem, doc) + txt = p.text.strip().lower() + # 检测模块标题通常是短文本且包含特定关键词 + if txt and len(txt) < 80: + is_mod = any(k in txt for k in detection_kw) + is_exc = any(k in txt for k in exclude_kw) + if is_mod and not is_exc: + first_module_idx = idx + print(f" 找到第一个检测模块: 位置{idx}") + break + except: + pass + + # 删除第一个检测模块之前的所有表格(无论有无数据) + removed_early = 0 + for idx, elem in enumerate(children_early): + if idx >= first_module_idx: + break + if elem.tag.endswith('}tbl'): + try: + elem.getparent().remove(elem) + removed_early += 1 + except: + pass + + if removed_early > 0: + print(f"[OK] 删除异常指标汇总区域表格: {removed_early} 个") + + removed_rows = 0 + merged_count = 0 + + def has_data_in_row(cells): + # 有效的定性结果列表 + valid_qualitative = [ + 'negative', 'positive', 'normal', 'reactive', 'non-reactive', + 'trace', 'clear', 'cloudy', 'turbid', + 'yellow', 'pale yellow', 'dark yellow', 'amber', 'straw', # 尿液颜色 + 'red', 'brown', 'green', 'orange', + 'a', 'b', 'ab', 'o', 'rh+', 'rh-', # 血型 + 'detected', 'not detected', 'present', 'absent' + ] + + # 只以“Result列”判断是否有数据,避免把 Project/Refer 误判为结果 + # 模板结构通常为: + # - 11列:0 ABB, 1-2 Project, 3-4 Result, 5-6 Point, 7-8 Refer, 9-10 Unit + # - 6列:0 ABB, 1 Project, 2 Result, 3 Point, 4 Refer, 5 Unit + if len(cells) >= 11: + result_col_candidates = [3, 4] + elif len(cells) >= 6: + result_col_candidates = [2, 3] + else: + result_col_candidates = [2] + + result_candidates = [] + for col_idx in result_col_candidates: + if col_idx < len(cells): + txt = (cells[col_idx].text or '').strip() + if txt: + result_candidates.append(txt) + result_text = result_candidates[0] if result_candidates else '' + + if not result_text: + return False + if result_text in ['', '-', '/', ' ', '.', ':', '{{', '}}']: + return False + if result_text.startswith('{{'): + return False + + # 排除“范围值”形态(常出现在 Refer 列,但模板错位时也可能落到 Result/Point 列) + if re.match(r'^[\(\[]?\s*[-+]?\d+(?:\.\d+)?\s*[-–~]\s*[-+]?\d+(?:\.\d+)?\s*[\)\]]?$', result_text): + return False + + if re.search(r'\d', result_text): + return True + if result_text.lower() in valid_qualitative: + return True + if len(result_text) > 2 and result_text.isalpha(): + return True + return False + + def is_header_row(row_text, cells=None): + """精确识别表头行""" + # 先排除描述行,避免被误判为表头 + if 'clinical significance' in row_text or '临床意义' in row_text: + return False + + # 表头必须具备“Abb/简称 + Project/项目 + Result/结果”组合特征 + has_abb = ('abb' in row_text) or ('简称' in row_text) + has_project = ('project' in row_text) or ('项目' in row_text) + has_result = ('result' in row_text) or ('结果' in row_text) + if not (has_abb and has_project and has_result): + return False + + if cells: + non_empty_cells = [c for c in cells if c.text.strip()] + if len(non_empty_cells) < 2: + return False + if any(len(c.text.strip()) > 30 for c in cells): + return False + + return True + + def is_description_row(row_text): + return 'clinical significance' in row_text or '临床意义' in row_text + + def is_data_row(first_cell): + if first_cell and 1 <= len(first_cell) <= 20: + clean = re.sub(r'[^a-zA-Z0-9]', '', first_cell) + return bool(clean) and clean.isalnum() + return False + + def analyze_table(table): + info = {'header_idx': -1, 'desc_indices': [], 'data_with_result': [], 'data_without_result': []} + for row_idx, row in enumerate(table.rows): + cells = row.cells + if len(cells) < 2: + continue + row_text = ' '.join([c.text.strip().lower() for c in cells]) + first_cell = cells[0].text.strip() + + if is_header_row(row_text, cells): + info['header_idx'] = row_idx + elif is_description_row(row_text): + info['desc_indices'].append(row_idx) + elif is_data_row(first_cell): + if has_data_in_row(cells): + info['data_with_result'].append(row_idx) + else: + info['data_without_result'].append(row_idx) + return info + + def is_special_table(table): + try: + if len(table.rows) != 3: + return False + row2_text = ' '.join([c.text for c in table.rows[2].cells]).lower() + return ('clinical significance' in row2_text) or ('临床意义' in row2_text) + except: + return False + + def special_table_has_data(table): + try: + if len(table.rows) < 2: + return False + cells = table.rows[1].cells + if len(cells) < 3: + return False + result_text = (cells[2].text or '').strip() + if not result_text: + return False + if result_text in ['', '-', '/', '.', ':']: + return False + if result_text.startswith('{{'): + return False + return True + except: + return False + + removed_special_tables = 0 + for table in list(doc.tables): + if is_special_table(table) and not special_table_has_data(table): + try: + table._tbl.getparent().remove(table._tbl) + removed_special_tables += 1 + except: + pass + + # 获取表格顺序 + body = doc._body._body + table_order = [] + table_elem_indices = {} # 记录每个表格在body中的元素索引 + body_children = list(body) + for idx, elem in enumerate(body_children): + if elem.tag.endswith('}tbl'): + for t in doc.tables: + if t._tbl is elem: + table_order.append(t) + table_elem_indices[t] = idx + break + + # 找到第一个检测模块标题的位置(用于排除文档开头的非检测模块表格) + from docx.text.paragraph import Paragraph as Para + first_module_elem_idx = len(body_children) # 默认在最后 + for idx, elem in enumerate(body_children): + if elem.tag.endswith('}p'): + try: + p = Para(elem, doc) + txt = p.text.strip().lower() + # 检查是否是检测模块标题(排除非检测模块) + if txt and len(txt) < 50: + is_module = any(kw in txt for kw in module_keywords) + is_exclude = any(kw in txt for kw in exclude_keywords) + if is_module and not is_exclude: + first_module_elem_idx = idx + break + except: + pass + + # 合并表格(只在下一个表头之前搜索,避免跨模块吸走数据) + # 排除文档开头(第一个检测模块之前)的表格,避免把数据合并到非检测模块表格 + tables_to_remove = set() + for i in range(len(table_order)): + if table_order[i] in tables_to_remove: + continue + + t1 = table_order[i] + t1_elem_idx = table_elem_indices.get(t1, 0) + + # 跳过第一个检测模块之前的表格(如"异常指标汇总") + if t1_elem_idx < first_module_elem_idx: + continue + + info1 = analyze_table(t1) + + if info1['header_idx'] >= 0 and len(info1['data_with_result']) == 0: + next_header_pos = None + for k in range(i + 1, len(table_order)): + if table_order[k] in tables_to_remove: + continue + k_info = analyze_table(table_order[k]) + if k_info['header_idx'] >= 0 and len(k_info['data_with_result']) == 0: + next_header_pos = k + break + search_end = next_header_pos if next_header_pos is not None else len(table_order) + + candidates = [] + for j in range(i + 1, search_end): + if table_order[j] in tables_to_remove: + continue + candidate = table_order[j] + candidate_info = analyze_table(candidate) + if len(candidate_info['data_with_result']) > 0: + candidates.append((candidate, candidate_info)) + + if not candidates: + continue + + # 取第一条数据的项目名作为标题 + title_text = '' + try: + first_candidate, first_candidate_info = candidates[0] + if first_candidate_info.get('data_with_result'): + data_row_idx = first_candidate_info['data_with_result'][0] + if len(first_candidate.rows[data_row_idx].cells) > 1: + title_text = first_candidate.rows[data_row_idx].cells[1].text.strip() + if not title_text: + title_text = first_candidate.rows[data_row_idx].cells[0].text.strip() + except: + title_text = '' + + # 清空:删除表头行之后所有旧行,但尽量保留表头下一行作为“标题行结构” + header_idx = info1['header_idx'] + title_row_idx = header_idx + 1 + keep_title_row = title_row_idx < len(t1.rows) + delete_from = (title_row_idx + 1) if keep_title_row else (header_idx + 1) + for ridx in range(len(t1.rows) - 1, delete_from - 1, -1): + try: + t1._tbl.remove(t1.rows[ridx]._tr) + removed_rows += 1 + except: + pass + + if not keep_title_row: + try: + new_tr = copy.deepcopy(t1.rows[header_idx]._tr) + t1._tbl.insert(title_row_idx, new_tr) + except: + pass + + try: + if title_row_idx < len(t1.rows): + title_row = t1.rows[title_row_idx] + for c in title_row.cells: + c.text = '' + if title_text: + title_row.cells[0].text = title_text + except: + pass + + for candidate, candidate_info in candidates: + for row_idx in candidate_info['data_with_result'] + candidate_info['desc_indices']: + new_tr = copy.deepcopy(candidate.rows[row_idx]._tr) + t1._tbl.append(new_tr) + + tables_to_remove.add(candidate) + merged_count += 1 + + for t in tables_to_remove: + try: + t._tbl.getparent().remove(t._tbl) + except: + pass + + # 删除逻辑: + # 1. 两个数据行都没数据 → 删除整个表格 + # 2. 一行有数据一行没有 → 只删没数据的行,保留解释行 + tables_to_delete = [] + + for table in doc.tables: + info = analyze_table(table) + data_with = info['data_with_result'] # 有数据的行 + data_without = info['data_without_result'] # 没数据的行 + + # 情况1:所有数据行都没有数据 → 删除整个表格 + if len(data_with) == 0 and len(data_without) > 0: + tables_to_delete.append(table) + continue + + # 情况2:有些行有数据,有些没有 → 只删除没数据的行 + if len(data_with) > 0 and len(data_without) > 0: + for row_idx in sorted(data_without, reverse=True): + try: + table._tbl.remove(table.rows[row_idx]._tr) + removed_rows += 1 + except: + pass + + # 删除整个表格 + for table in tables_to_delete: + try: + table._tbl.getparent().remove(table._tbl) + removed_rows += 1 + except: + pass + + # 补全合并后的标题行(表头下一行为空时) + for table in doc.tables: + info = analyze_table(table) + if info['header_idx'] < 0: + continue + if len(info['data_with_result']) == 0: + continue + + title_row_idx = info['header_idx'] + 1 + if title_row_idx >= len(table.rows): + continue + + try: + title_row = table.rows[title_row_idx] + # 如果表头下一行本身就是数据行,则需要插入一个独立标题行 + try: + first_cell = title_row.cells[0].text.strip() if title_row.cells else '' + if is_data_row(first_cell) and has_data_in_row(title_row.cells): + extracted_title = '' + try: + if len(title_row.cells) > 1: + extracted_title = title_row.cells[1].text.strip() + if not extracted_title: + extracted_title = title_row.cells[0].text.strip() + except: + extracted_title = '' + + header_tr = copy.deepcopy(table.rows[info['header_idx']]._tr) + table._tbl.insert(title_row_idx, header_tr) + title_row = table.rows[title_row_idx] + try: + for c in title_row.cells: + c.text = '' + if extracted_title: + title_row.cells[0].text = extracted_title + except: + pass + continue + except: + pass + + if any((c.text or '').strip() for c in title_row.cells): + continue + + first_data_idx = info['data_with_result'][0] + if first_data_idx >= len(table.rows): + continue + data_row = table.rows[first_data_idx] + + title_text = '' + if len(data_row.cells) > 1: + title_text = data_row.cells[1].text.strip() + if not title_text: + title_text = data_row.cells[0].text.strip() + if not title_text: + continue + + for c in title_row.cells: + c.text = '' + title_row.cells[0].text = title_text + except: + pass + + # 删除没有数据且没有表头的表格(保留表头表格) + removed_tables = 0 + for table in list(doc.tables): + info = analyze_table(table) + # 只删除既没有数据也没有表头的表格 + if len(info['data_with_result']) == 0 and info['header_idx'] < 0: + try: + table._tbl.getparent().remove(table._tbl) + removed_tables += 1 + except: + pass + + # === 新增:梳理文档结构 === + # 模块标题关键词(24个文字模块分类) + module_keywords = [ + # 1. 尿液检测 + 'urine detection', 'urine analysis', 'urinalysis', '尿液检测', '尿常规', + # 2. 血常规 + 'complete blood count', 'blood routine', 'cbc', '血常规', + # 3. 血糖 + 'blood sugar', 'glucose', 'blood glucose', '血糖', '糖代谢', + # 4. 血脂 + 'lipid panel', 'lipid profile', 'blood lipid', '血脂', + # 5. 血型 + 'blood type', 'blood group', 'abo', '血型', + # 6. 凝血功能 + 'coagulation', 'clotting', '凝血功能', '凝血', + # 7. 传染病四项 + 'infectious disease', 'hepatitis', '传染病四项', '传染病', + # 8. 血电解质 + 'electrolyte', 'serum electrolyte', '血电解质', '电解质', + # 9. 肝功能 + 'liver function', 'hepatic function', '肝功能', + # 10. 肾功能 + 'kidney function', 'renal function', '肾功能', + # 11. 心肌酶谱 + 'cardiac enzyme', 'myocardial enzyme', '心肌酶谱', '心肌酶', + # 12. 甲状腺功能 + 'thyroid function', 'thyroid', '甲状腺功能', '甲状腺', + # 13. 心脑血管风险因子 + 'cardiovascular risk', 'cerebrovascular', '心脑血管风险因子', '心脑血管', '心血管', + # 14. 骨代谢 + 'bone metabolism', 'bone marker', '骨代谢', + # 15. 微量元素 + 'trace element', 'microelement', 'heavy metal', '微量元素', '重金属', + # 16. 淋巴细胞亚群 + 'lymphocyte subsets', 'lymphocyte subpopulation', '淋巴细胞亚群', + # 17. 体液免疫 + 'humoral immunity', 'immunoglobulin', '体液免疫', + # 18. 炎症反应 + 'inflammation', 'inflammatory', '炎症反应', '炎症', + # 19. 自身抗体 + 'autoantibody', 'autoimmune', '自身抗体', '自身免疫', + # 20. 女性荷尔蒙 + 'female hormone', 'estrogen', 'progesterone', '女性荷尔蒙', '女性激素', + # 21. 男性荷尔蒙 + 'male hormone', 'testosterone', 'androgen', '男性荷尔蒙', '男性激素', + # 22. 肿瘤标记物 + 'tumor marker', 'cancer marker', '肿瘤标记物', '肿瘤标志物', + # 23. 影像学检查 + 'imaging', 'radiology', 'ultrasound', 'x-ray', 'ct', 'mri', '影像学检查', '影像', + # 24. 女性专项检查 + 'female specific', 'gynecological', 'gynecology', '女性专项检查', '妇科', + ] + + # 排除列表:这些不是检测模块,不应该被识别为模块标题 + exclude_keywords = [ + 'client health program', '客户健康方案', + 'health report', '健康报告', + 'overall health', '整体健康', + 'health assessment', '健康评估', + 'abnormal index', '异常指标', + 'be.u', 'wellness center', + 'name', 'gender', 'age', 'nation', # 用户信息字段 + '姓名', '性别', '年龄', '国籍', + ] + + def contains_exclude_keyword(text: str) -> bool: + """检查文本是否包含排除关键词""" + text_lower = text.lower() + return any(kw in text_lower for kw in exclude_keywords) + + def is_module_title_table(table): + """检查表格是否是模块标题表格""" + if len(table.rows) < 1: + return False + try: + for row_idx in range(min(2, len(table.rows))): + row_text = ' '.join([c.text.lower().strip() for c in table.rows[row_idx].cells]) + # 先检查排除关键词 + if contains_exclude_keyword(row_text): + return False + for kw in module_keywords: + if kw in row_text: + return True + except: + pass + return False + + def table_has_data(table): + """检查表格是否有有效数据""" + info = analyze_table(table) + return len(info['data_with_result']) > 0 + + def is_module_title_paragraph(p_text: str) -> bool: + """检查段落是否是模块标题(文字模块)""" + if not p_text: + return False + text = p_text.strip().lower() + if not text: + return False + # 标题通常很短(避免误匹配正文) + if len(text) > 40: + return False + # 先检查排除关键词 + if contains_exclude_keyword(text): + return False + return any(kw in text for kw in module_keywords) + + # 1. 基于body元素顺序识别模块(支持段落标题与表格标题) + from docx.oxml import OxmlElement + from docx.oxml.ns import qn as oxml_qn + from docx.text.paragraph import Paragraph + from docx.table import Table + + body = doc._body._body + body_children = list(body) + + tbl_map = {t._tbl: t for t in doc.tables} + + def get_table_from_elem(elem): + return tbl_map.get(elem) + + def is_blank_paragraph_elem(elem): + try: + p = Paragraph(elem, doc) + return p.text.strip() == '' + except: + return False + + def create_visible_blank_paragraph(): + """创建可见的空行段落(含一个空格run,避免被Word折叠)""" + p = OxmlElement('w:p') + pPr = OxmlElement('w:pPr') + spacing = OxmlElement('w:spacing') + spacing.set(oxml_qn('w:after'), '0') + spacing.set(oxml_qn('w:before'), '0') + pPr.append(spacing) + p.append(pPr) + + r = OxmlElement('w:r') + t = OxmlElement('w:t') + t.text = ' ' + r.append(t) + p.append(r) + return p + + def is_module_start_elem(elem): + if elem.tag.endswith('}tbl'): + t = get_table_from_elem(elem) + return bool(t) and is_module_title_table(t) + if elem.tag.endswith('}p'): + try: + p = Paragraph(elem, doc) + return is_module_title_paragraph(p.text) + except: + return False + return False + + # 收集所有模块起点 + module_start_indices = [i for i, e in enumerate(body_children) if is_module_start_elem(e)] + + # === 模块删除逻辑(删除无数据的文字模块及其表格)=== + # 规则:当一个文字模块中没有任何表格有数据时,删除该模块标题和所有表格 + removed_modules = 0 + elements_removed_in_modules = 0 + + if module_start_indices: + # 从后往前处理每个模块,避免索引变化问题 + for idx in range(len(module_start_indices) - 1, -1, -1): + start_i = module_start_indices[idx] + end_i = module_start_indices[idx + 1] if idx + 1 < len(module_start_indices) else len(body_children) + + # 获取模块区间内的所有元素 + module_elements = body_children[start_i:end_i] + + # 检查模块内是否有任何表格有数据 + module_has_data = False + module_tables = [] + for e in module_elements: + if e.tag.endswith('}tbl'): + t = get_table_from_elem(e) + if t: + module_tables.append(e) + if table_has_data(t): + module_has_data = True + + # 如果模块没有数据,删除模块标题和所有表格 + if not module_has_data and module_tables: + # 删除模块内的所有元素(从后往前删除) + for e in reversed(module_elements): + try: + e.getparent().remove(e) + elements_removed_in_modules += 1 + except: + pass + removed_modules += 1 + + # 重新抓取body(删除后索引已变化) + body = doc._body._body + body_children = list(body) + + # 2. 在模块内表格之间添加空行(段落/表格标题均作为模块边界) + space_count = 0 + current_module_started = False + prev_was_data_table = False + + i = 0 + while i < len(body_children): + elem = body_children[i] + + if is_module_start_elem(elem): + current_module_started = True + prev_was_data_table = False + i += 1 + continue + + if current_module_started and elem.tag.endswith('}tbl'): + t = get_table_from_elem(elem) + is_title = bool(t) and is_module_title_table(t) + is_data = bool(t) and (not is_title) and table_has_data(t) + + if is_data: + # 向上跳过空段落,判断前一个有效元素是否为数据表格 + j = i - 1 + while j >= 0 and body_children[j].tag.endswith('}p') and is_blank_paragraph_elem(body_children[j]): + j -= 1 + + prev_is_data_table = False + if j >= 0 and body_children[j].tag.endswith('}tbl'): + prev_t = get_table_from_elem(body_children[j]) + if prev_t and (not is_module_title_table(prev_t)) and table_has_data(prev_t): + prev_is_data_table = True + + if prev_is_data_table: + # 保证两表之间有一个“可见空行”段落 + prev_elem = body_children[i - 1] if i - 1 >= 0 else None + + # 情况1:紧挨着上一张表(或非空段落)=> 插入可见空行 + if not (prev_elem is not None and prev_elem.tag.endswith('}p') and is_blank_paragraph_elem(prev_elem)): + empty_p = create_visible_blank_paragraph() + body.insert(i, empty_p) + space_count += 1 + body_children = list(body) + i += 1 + else: + # 情况2:已有空段落,但可能不可见 => 补一个空格 run + try: + p_elem = prev_elem + has_run = any(c.tag.endswith('}r') for c in list(p_elem)) + if not has_run: + r = OxmlElement('w:r') + tt = OxmlElement('w:t') + tt.text = ' ' + r.append(tt) + p_elem.append(r) + space_count += 1 + except: + pass + + prev_was_data_table = is_data + elif elem.tag.endswith('}p'): + # 如果表格之间已经有段落(无论是否空白),就不重复插入 + pass + + i += 1 + + doc.save(output_path) + print(f"[OK] 清理完成: 删除 {removed_rows} 行, 合并 {merged_count} 对表格, 删除 {removed_tables} 个空表格") + print(f"[OK] 清理特殊表格: 删除 {removed_special_tables} 个空特殊表格") + print(f"[OK] 结构整理: 删除 {removed_modules} 个无数据模块, 删除 {elements_removed_in_modules} 个模块元素, 插入 {space_count} 个表格间空行") + return doc + +def main(): + """主函数""" + # 路径配置 + template_path = r"c:\Users\UI\Desktop\医疗报告\template_docxtpl.docx" + filled_path = r"c:\Users\UI\Desktop\医疗报告\backend\reports\filled_docxtpl_temp.docx" + reports_dir = Path(__file__).parent / "reports" + reports_dir.mkdir(parents=True, exist_ok=True) + + def get_next_output_path() -> str: + existing = list(reports_dir.glob("filled_report_v*.docx")) + max_v = 0 + for p in existing: + name = p.stem + try: + v_str = name.split("filled_report_v", 1)[1] + v = int(v_str) + if v > max_v: + max_v = v + except: + continue + return str(reports_dir / f"filled_report_v{max_v + 1}.docx") + + output_path = get_next_output_path() + + # 优先使用DeepSeek处理后的数据 + deepseek_file = Path(__file__).parent / "deepseek_processed_data.json" + extracted_file = Path(__file__).parent / "extracted_medical_data.json" + + # 加载ABB配置 + from config import load_abb_config + abb_config = load_abb_config() + + use_deepseek = deepseek_file.exists() + + if use_deepseek: + print("使用DeepSeek处理后的数据") + with open(deepseek_file, 'r', encoding='utf-8') as f: + matched_data = json.load(f) + print(f"加载 {len(matched_data)} 个匹配项") + + # 直接填充,跳过匹配步骤 + print("\n步骤1: 填充数据...") + fill_template(template_path, matched_data, filled_path) + + print("\n步骤2: 清理空白行...") + clean_empty_rows(filled_path, output_path) + + import os + if os.path.exists(filled_path): + os.remove(filled_path) + + print(f"\n[SUCCESS] 完成! 输出: {output_path}") + return + + # 原有逻辑:使用本地处理 + if not extracted_file.exists(): + print("[ERROR] 未找到提取数据,请先运行 deepseek_process.py 或 extract_and_fill_report.py") + return + + with open(extracted_file, 'r', encoding='utf-8') as f: + data = json.load(f) + + if isinstance(data, dict): + extracted_items = data.get('items', []) + else: + extracted_items = data + + # 清理数据(分离单位和参考范围) + extracted_items = clean_extracted_data(extracted_items) + + print(f"加载 {len(extracted_items)} 个提取项") + + # 使用已加载的ABB配置 + template_abbs = {} + for abb_upper, info in abb_config.get('abb_to_info', {}).items(): + template_abbs[abb_upper] = info + # 处理包含/的ABB + if '/' in abb_upper: + for part in abb_upper.split('/'): + template_abbs[part.strip()] = info + + # 按ABB分组 + items_by_abb = {} + for item in extracted_items: + abb = item['abb'].upper() + if abb not in items_by_abb: + items_by_abb[abb] = [] + items_by_abb[abb].append(item) + + # 选择每个ABB的最佳匹配 + best_items = select_best_match(items_by_abb) + + # 与模板匹配 + matched_data = {} + for abb, item in best_items.items(): + if abb in template_abbs: + matched_data[abb] = item + else: + for t_abb in template_abbs: + if abb in t_abb or t_abb in abb: + matched_data[t_abb] = item + break + + print(f"清理后 {len(best_items)} 个有效项, 匹配 {len(matched_data)} 个") + + # 步骤1: 填充 + print("\n步骤1: 填充数据...") + fill_template(template_path, matched_data, filled_path) + + # 步骤2: 清理空行 + print("\n步骤2: 清理空白行...") + clean_empty_rows(filled_path, output_path) + + # 删除临时文件 + import os + if os.path.exists(filled_path): + os.remove(filled_path) + + print(f"\n[SUCCESS] 完成! 输出: {output_path}") + + +if __name__ == "__main__": + main() diff --git a/backend/health_assessment_v2.py b/backend/health_assessment_v2.py new file mode 100644 index 0000000..25b9004 --- /dev/null +++ b/backend/health_assessment_v2.py @@ -0,0 +1,962 @@ +""" +整体健康情况分析模块 V2 +按照功能医学整体观重新设计的报告分析生成器 + +核心原则: +1. 功能医学整体观视角,聚焦"系统功能平衡""机能趋势预判" +2. 专业、客观、严谨的语言风格 +3. 只呈现"指标状态→功能提示→关注方向",不包含干预方案 +""" + +import json +import re +from typing import Dict, List, Any, Tuple +from docx.oxml import OxmlElement +from docx.oxml.ns import qn + + +# 系统分类映射:将模块映射到四大系统 +SYSTEM_MAPPING = { + # (I) 血液学与炎症状态 + 'Hematology': [ + 'Complete Blood Count', # 血常规 + 'Blood Coagulation', # 凝血功能 + 'Inflammatory Reaction', # 炎症反应 + ], + # (II) 荷尔蒙与内分泌调节 + 'Endocrine': [ + 'Thyroid Function', # 甲状腺功能 + 'Female Hormone', # 女性荷尔蒙 + 'Male Hormone', # 男性荷尔蒙 + 'Bone Metabolism', # 骨代谢 + ], + # (III) 免疫学与感染风险 + 'Immunology': [ + 'Four Infectious Diseases', # 传染病四项 + 'Lymphocyte Subpopulation', # 淋巴细胞亚群 + 'Humoral Immunity', # 体液免疫 + 'Autoantibody', # 自身抗体 + ], + # (IV) 营养与代谢状况 + 'Metabolism': [ + 'Blood Sugar', # 血糖 + 'Lipid Profile', # 血脂 + 'Liver Function', # 肝功能 + 'Kidney Function', # 肾功能 + 'Serum Electrolytes', # 血电解质 + 'Microelement', # 微量元素 + 'Urine Test', # 尿液检测 + 'Myocardial Enzyme', # 心肌酶谱 + 'Thromboembolism', # 心脑血管风险因子 + 'Tumor Markers', # 肿瘤标记物 + ], +} + +# 系统中英文名称 +SYSTEM_NAMES = { + 'Hematology': { + 'en': '(I) Hematology and Inflammatory Status', + 'cn': '(一)血液学与炎症状态' + }, + 'Endocrine': { + 'en': '(II) Hormonal and Endocrine Regulation', + 'cn': '(二)荷尔蒙与内分泌调节' + }, + 'Immunology': { + 'en': '(III) Immunology and Infection Risk', + 'cn': '(三)免疫学与感染风险' + }, + 'Metabolism': { + 'en': '(IV) Nutrition and Metabolic Profile', + 'cn': '(四)营养与代谢状况' + }, +} + + +def get_system_for_module(module_name: str) -> str: + """根据模块名称获取所属系统""" + for system, modules in SYSTEM_MAPPING.items(): + if module_name in modules: + return system + # 默认归入代谢系统 + return 'Metabolism' + + +def classify_items_by_system(matched_data: dict, config: dict = None) -> Dict[str, Dict[str, List]]: + """ + 将所有检测项目按四大系统分类 + + Returns: + { + 'Hematology': { + 'normal': [...], # 正常指标 + 'abnormal': [...], # 异常指标 + 'borderline': [...] # 临界指标 + }, + ... + } + """ + from config import load_abb_config, normalize_abb + + if config is None: + config = load_abb_config() + + abb_to_info = config.get('abb_to_info', {}) + + result = { + 'Hematology': {'normal': [], 'abnormal': [], 'borderline': []}, + 'Endocrine': {'normal': [], 'abnormal': [], 'borderline': []}, + 'Immunology': {'normal': [], 'abnormal': [], 'borderline': []}, + 'Metabolism': {'normal': [], 'abnormal': [], 'borderline': []}, + } + + for abb, data in matched_data.items(): + point = data.get('point', '').strip() + result_val = data.get('result', '').strip() + reference = data.get('reference', '').strip() + unit = data.get('unit', '').strip() + + # 获取模块信息 + normalized_abb = normalize_abb(abb, config) + info = abb_to_info.get(normalized_abb.upper(), {}) + if not info: + info = abb_to_info.get(abb.upper(), {}) + + module = info.get('module', data.get('module', '')) + system = get_system_for_module(module) + + # 获取中文名称 + name = info.get('project_cn') or data.get('project_cn') or info.get('project') or data.get('project', abb) + + item_info = { + 'abb': abb, + 'name': name, + 'result': result_val, + 'unit': unit, + 'reference': reference, + 'point': point, + 'module': module, + 'system': system + } + + # 分类:正常、异常、临界 + if point in ['↑', '↓', 'H', 'L', '高', '低']: + # 判断是否是临界值(接近参考范围边界) + is_borderline = _is_borderline_value(result_val, reference, point) + if is_borderline: + result[system]['borderline'].append(item_info) + else: + result[system]['abnormal'].append(item_info) + else: + # 正常指标 + if result_val: # 只添加有结果的项目 + result[system]['normal'].append(item_info) + + return result + + +def _is_borderline_value(result: str, reference: str, point: str) -> bool: + """ + 判断是否是临界值(偏离参考范围不超过10%) + """ + try: + result_num = float(re.sub(r'[^\d.]', '', result)) + + # 解析参考范围 + ref_match = re.search(r'([\d.]+)\s*[-~]\s*([\d.]+)', reference) + if ref_match: + ref_low = float(ref_match.group(1)) + ref_high = float(ref_match.group(2)) + + if point in ['↑', 'H', '高']: + # 偏高:检查是否超出上限不超过10% + if ref_high > 0: + deviation = (result_num - ref_high) / ref_high + return 0 < deviation <= 0.1 + elif point in ['↓', 'L', '低']: + # 偏低:检查是否低于下限不超过10% + if ref_low > 0: + deviation = (ref_low - result_num) / ref_low + return 0 < deviation <= 0.1 + except: + pass + + return False + + +def collect_all_items_for_assessment(matched_data: dict, api_key: str = None) -> Tuple[List, List, Dict]: + """ + 收集所有指标用于健康评估 + + Returns: + (normal_items, abnormal_items, system_classified_data) + """ + from config import load_abb_config, normalize_abb + + config = load_abb_config() + abb_to_info = config.get('abb_to_info', {}) + + normal_items = [] + abnormal_items = [] + + for abb, data in matched_data.items(): + point = data.get('point', '').strip() + result_val = data.get('result', '').strip() + reference = data.get('reference', '').strip() + unit = data.get('unit', '').strip() + + if not result_val: + continue + + # 获取项目信息 + normalized_abb = normalize_abb(abb, config) + info = abb_to_info.get(normalized_abb.upper(), {}) + if not info: + info = abb_to_info.get(abb.upper(), {}) + + module = info.get('module', data.get('module', '')) + name = info.get('project_cn') or data.get('project_cn') or info.get('project') or data.get('project', abb) + + item_info = { + 'abb': abb, + 'name': name, + 'result': result_val, + 'unit': unit, + 'reference': reference, + 'point': point, + 'module': module, + 'system': get_system_for_module(module) + } + + if point in ['↑', '↓', 'H', 'L', '高', '低']: + abnormal_items.append(item_info) + else: + normal_items.append(item_info) + + # 按系统分类 + system_data = classify_items_by_system(matched_data, config) + + return normal_items, abnormal_items, system_data + + +def build_assessment_prompt(normal_items: List, abnormal_items: List, system_data: Dict) -> str: + """ + 构建整体健康情况分析的 Prompt(基于案例文档优化) + """ + # 构建正常指标描述 + normal_desc = [] + for item in normal_items[:30]: + desc = f" - {item['name']} ({item['abb']}): {item['result']}" + if item.get('unit'): + desc += f" {item['unit']}" + if item.get('reference'): + desc += f" [参考: {item['reference']}]" + normal_desc.append(desc) + + # 构建异常指标描述(按系统分组) + abnormal_by_system = {} + for item in abnormal_items: + system = item.get('system', 'Metabolism') + if system not in abnormal_by_system: + abnormal_by_system[system] = [] + + direction = '偏高' if item['point'] in ['↑', 'H', '高'] else '偏低' + is_borderline = _is_borderline_value(item['result'], item.get('reference', ''), item['point']) + level = '临界' if is_borderline else '异常' + + desc = f" - {item['name']} ({item['abb']}): {item['result']}" + if item.get('unit'): + desc += f" {item['unit']}" + desc += f" ({direction}, {level})" + if item.get('reference'): + desc += f" [参考: {item['reference']}]" + abnormal_by_system[system].append(desc) + + # 构建系统分组的异常指标描述 + system_abnormal_desc = [] + for system_key, system_info in SYSTEM_NAMES.items(): + items = abnormal_by_system.get(system_key, []) + if items: + system_abnormal_desc.append(f"\n【{system_info['cn']}】") + system_abnormal_desc.extend(items) + + prompt = f"""# 角色设定 +你是Be.U Med功能医学团队的资深医学顾问,在功能医学、整体健康、抗衰老医学领域具有丰富的临床经验。 + +# 任务 +根据体检者的血液检查报告,撰写"整体健康情况分析"报告。 + +# 检测数据 + +## 正常指标(部分) +{chr(10).join(normal_desc) if normal_desc else ' 暂无数据'} + +## 异常/临界指标(按系统分类) +{chr(10).join(system_abnormal_desc) if system_abnormal_desc else ' 暂无异常指标'} + +# 核心原则(必须严格遵守) + +## 1. 段落格式(极其重要!) +- **每个段落必须先写英文,再写对应的中文** +- **第一段:英文80-120词,中文80-120字** +- **第二段:英文80-100词,中文约120字(严格控制在110-130字之间)** +- 不要英中混排,必须分开 + +## 2. 语言风格 +- 专业、客观、严谨,体现功能医学视角 +- 使用"提示""可能""建议""值得关注""需要注意"等引导词 +- 禁用"必须""一定""保证""治愈"等绝对化表述 +- 不做临床疾病诊断,聚焦功能状态分析 + +## 3. 核心指标判定 +- **核心指标**:从医学角度判定各生理学系统的关键指标 +- **异常项**:超出参考范围的指标 + 逼近临界值的指标 +- 指标必须精准,标注具体数值及单位 + +# 文章结构(必须严格遵循) + +## 总述概述(2段) +**第一段**:前半部分列重点正常项及数值,后半部分列重点异常项及数值 +**第二段**:说明这些异常指标对整体健康的综合影响 + +## 四大系统分析(固定顺序,每个系统2段) + +### (I) Hematology and Inflammatory Status / (一)血液学与炎症状态 +**第一段**:前半部分列该系统重点正常项及数值,后半部分列重点异常项及数值(含临界值) +**第二段**:说明该系统核心异常指标对其他生理系统的影响 + +### (II) Hormonal and Endocrine Regulation / (二)荷尔蒙与内分泌调节 +**第一段**:前半部分列该系统重点正常项及数值,后半部分列重点异常项及数值(含临界值) +**第二段**:说明该系统核心异常指标对其他生理系统的影响 + +### (III) Immunology and Infection Risk / (三)免疫学与感染风险 +**第一段**:前半部分列该系统重点正常项及数值,后半部分列重点异常项及数值(含临界值) +**第二段**:说明该系统核心异常指标对其他生理系统的影响 + +### (IV) Nutrition and Metabolic Profile / (四)营养与代谢状况 +**第一段**:前半部分列该系统重点正常项及数值,后半部分列重点异常项及数值(含临界值) +**第二段**:说明该系统核心异常指标对其他生理系统的影响 + +## 结尾总结(2段) +**第一段 - 功能医学健康管理重点**:概括本次检测发现的核心健康管理重点 +**第二段 - 个性化管理方向**:说明往哪个方向开展个性化健康管理 + +# 输出格式(JSON) + +```json +{{ + "overview": {{ + "paragraph1": {{ + "en": "英文(80-120词):前半部分列重点正常项及数值,后半部分列重点异常项及数值...", + "cn": "中文(80-120字):对应翻译..." + }}, + "paragraph2": {{ + "en": "英文(80-100词):说明异常指标对整体健康的综合影响...", + "cn": "中文(约120字):对应翻译..." + }} + }}, + "systems": [ + {{ + "key": "Hematology", + "title_en": "(I) Hematology and Inflammatory Status", + "title_cn": "(一)血液学与炎症状态", + "paragraph1": {{ + "en": "英文(80-120词):前半部分列该系统重点正常项及数值,后半部分列重点异常项及数值...", + "cn": "中文(80-120字):对应翻译..." + }}, + "paragraph2": {{ + "en": "英文(80-100词):说明该系统核心异常指标对其他生理系统的影响...", + "cn": "中文(约120字):对应翻译..." + }} + }}, + {{ + "key": "Endocrine", + "title_en": "(II) Hormonal and Endocrine Regulation", + "title_cn": "(二)荷尔蒙与内分泌调节", + "paragraph1": {{}}, + "paragraph2": {{}} + }}, + {{ + "key": "Immunology", + "title_en": "(III) Immunology and Infection Risk", + "title_cn": "(三)免疫学与感染风险", + "paragraph1": {{}}, + "paragraph2": {{}} + }}, + {{ + "key": "Metabolism", + "title_en": "(IV) Nutrition and Metabolic Profile", + "title_cn": "(四)营养与代谢状况", + "paragraph1": {{}}, + "paragraph2": {{}} + }} + ], + "conclusion": {{ + "management_focus": {{ + "en": "英文(80-120词):功能医学健康管理重点概括...", + "cn": "中文(80-120字):对应翻译..." + }}, + "personalized_direction": {{ + "en": "英文(80-120词):个性化管理方向说明...", + "cn": "中文(80-120字):对应翻译..." + }} + }} +}} +``` + +# 重要提示 +1. **每个段落必须先英文后中文,不要混排** +2. **第一段:英文80-120词,中文80-120字** +3. **第二段:英文80-100词,中文约120字(严格控制在110-130字)** +4. **第一段结构:前半部分正常项+数值,后半部分异常项+数值** +5. **第二段结构:异常指标对其他系统的影响** +6. **结尾两段:功能医学管理重点 + 个性化管理方向** +7. **逼近临界值的指标也算作异常项** +8. **只返回JSON,不要其他内容**""" + + return prompt + + + + +def generate_health_assessment_v2(matched_data: dict, api_key: str, call_deepseek_api) -> dict: + """ + 生成整体健康情况分析内容(V2版本) + + Args: + matched_data: 匹配的检测数据 + api_key: DeepSeek API Key + call_deepseek_api: API调用函数 + + Returns: + 包含整体分析和系统分析的字典 + """ + if not api_key: + print(" ⚠️ 未提供API Key,跳过健康评估生成") + return {} + + # 收集所有指标 + normal_items, abnormal_items, system_data = collect_all_items_for_assessment(matched_data) + + if not normal_items and not abnormal_items: + print(" ⚠️ 没有检测数据,跳过健康评估生成") + return {} + + print(f" 📊 数据统计: 正常指标 {len(normal_items)} 个, 异常指标 {len(abnormal_items)} 个") + + # 构建prompt + prompt = build_assessment_prompt(normal_items, abnormal_items, system_data) + + def parse_json_response(response_text): + """解析JSON响应""" + # 提取JSON部分 + if '```json' in response_text: + response_text = response_text.split('```json')[1].split('```')[0] + elif '```' in response_text: + response_text = response_text.split('```')[1].split('```')[0] + + response_text = response_text.strip() + + try: + return json.loads(response_text) + except json.JSONDecodeError: + pass + + # 尝试修复常见问题 + if response_text.count('"') % 2 != 0: + response_text += '"' + + open_braces = response_text.count('{') - response_text.count('}') + open_brackets = response_text.count('[') - response_text.count(']') + + if open_brackets > 0: + if open_braces > 0: + response_text += '}' * open_braces + response_text += ']' * open_brackets + elif open_braces > 0: + response_text += '}' * open_braces + + try: + return json.loads(response_text) + except json.JSONDecodeError: + return None + + # 最多重试3次 + for attempt in range(3): + try: + print(f" 🤖 调用DeepSeek生成整体健康分析... (第{attempt+1}次)") + response = call_deepseek_api(prompt, api_key, max_tokens=6000, timeout=180) + + if response is None: + if attempt < 2: + print(f" ⚠️ API请求失败,重试中...") + import time + time.sleep(3) + continue + + result = parse_json_response(response) + + # 检查新格式(overview, systems)或旧格式(overall_analysis, system_analysis) + if result and (result.get('overview') or result.get('systems') or + result.get('overall_analysis') or result.get('system_analysis')): + print(f" ✓ 成功生成整体健康分析") + return result + + if attempt < 2: + print(f" ⚠️ 响应格式不完整,重试中...") + + except Exception as e: + if attempt < 2: + print(f" ⚠️ 生成失败: {e},重试中...") + + print(f" ✗ 生成整体健康分析失败") + return {} + + +def convert_v2_to_sections_format(v2_result: dict) -> dict: + """ + 将V2格式转换为原有的sections格式,以便复用现有的填充函数 + + 新格式:overview{paragraph1, paragraph2}, systems[], conclusion{management_focus, personalized_direction} + """ + sections = [] + + # 1. 总述部分(2段,不需要标题) + overview = v2_result.get('overview', {}) + if overview: + paragraphs = [] + + # 第一段:正常项+异常项 + para1 = overview.get('paragraph1', {}) + if para1.get('en') or para1.get('cn'): + paragraphs.append({ + 'en': para1.get('en', ''), + 'cn': para1.get('cn', '') + }) + + # 第二段:异常指标对整体健康的影响 + para2 = overview.get('paragraph2', {}) + if para2.get('en') or para2.get('cn'): + paragraphs.append({ + 'en': para2.get('en', ''), + 'cn': para2.get('cn', '') + }) + + if paragraphs: + sections.append({ + 'title_en': '', + 'title_cn': '', + 'paragraphs': paragraphs, + 'is_overview': True + }) + + # 2. 四大系统分析 + systems = v2_result.get('systems', []) + for system in systems: + paragraphs = [] + + # 第一段:正常项+异常项 + para1 = system.get('paragraph1', {}) + if para1.get('en') or para1.get('cn'): + paragraphs.append({ + 'en': para1.get('en', ''), + 'cn': para1.get('cn', '') + }) + + # 第二段:异常指标对其他系统的影响 + para2 = system.get('paragraph2', {}) + if para2.get('en') or para2.get('cn'): + paragraphs.append({ + 'en': para2.get('en', ''), + 'cn': para2.get('cn', '') + }) + + # 兼容旧格式(paragraphs数组) + if not paragraphs and system.get('paragraphs'): + for para in system.get('paragraphs', []): + if para.get('en') or para.get('cn'): + paragraphs.append({ + 'en': para.get('en', ''), + 'cn': para.get('cn', '') + }) + + if paragraphs: + sections.append({ + 'title_en': system.get('title_en', ''), + 'title_cn': system.get('title_cn', ''), + 'paragraphs': paragraphs + }) + + # 3. 结尾总结(2段) + conclusion = v2_result.get('conclusion', {}) + if conclusion: + paragraphs = [] + + # 第一段:功能医学健康管理重点 + mgmt_focus = conclusion.get('management_focus', {}) + if mgmt_focus.get('en') or mgmt_focus.get('cn'): + paragraphs.append({ + 'en': mgmt_focus.get('en', ''), + 'cn': mgmt_focus.get('cn', '') + }) + + # 第二段:个性化管理方向 + pers_dir = conclusion.get('personalized_direction', {}) + if pers_dir.get('en') or pers_dir.get('cn'): + paragraphs.append({ + 'en': pers_dir.get('en', ''), + 'cn': pers_dir.get('cn', '') + }) + + # 兼容旧格式(直接en/cn) + if not paragraphs and (conclusion.get('en') or conclusion.get('cn')): + paragraphs.append({ + 'en': conclusion.get('en', ''), + 'cn': conclusion.get('cn', '') + }) + + if paragraphs: + sections.append({ + 'title_en': '', + 'title_cn': '', + 'paragraphs': paragraphs, + 'is_conclusion': True + }) + + # 兼容旧格式(overall_analysis, system_analysis[]) + if not sections: + overall = v2_result.get('overall_analysis', {}) + if overall: + paragraphs = [] + summary = overall.get('summary', {}) + if summary.get('en') or summary.get('cn'): + paragraphs.append({'en': summary.get('en', ''), 'cn': summary.get('cn', '')}) + + strength = overall.get('strength_indicators', {}) + if strength.get('en') or strength.get('cn'): + paragraphs.append({'en': strength.get('en', ''), 'cn': strength.get('cn', '')}) + + abnormal = overall.get('abnormal_indicators', {}) + if abnormal.get('en') or abnormal.get('cn'): + paragraphs.append({'en': abnormal.get('en', ''), 'cn': abnormal.get('cn', '')}) + + focus = overall.get('focus_direction', {}) + if focus.get('en') or focus.get('cn'): + paragraphs.append({'en': focus.get('en', ''), 'cn': focus.get('cn', '')}) + + if paragraphs: + sections.append({ + 'title_en': overall.get('title_en', ''), + 'title_cn': overall.get('title_cn', ''), + 'paragraphs': paragraphs + }) + + system_analysis = v2_result.get('system_analysis', []) + for system in system_analysis: + paragraphs = [] + for key in ['summary', 'strength_indicators', 'abnormal_indicators', 'focus_direction']: + item = system.get(key, {}) + if item.get('en') or item.get('cn'): + paragraphs.append({'en': item.get('en', ''), 'cn': item.get('cn', '')}) + + if paragraphs: + sections.append({ + 'title_en': system.get('title_en', ''), + 'title_cn': system.get('title_cn', ''), + 'paragraphs': paragraphs + }) + + return {'sections': sections} + + +# ============================================================ +# 文档填充函数 +# ============================================================ + +def clean_markdown_formatting(text: str) -> str: + """清理文本中的Markdown格式标记""" + if not text: + return text + + text = re.sub(r'\*\*([^*]+)\*\*', r'\1', text) + text = re.sub(r'__([^_]+)__', r'\1', text) + text = re.sub(r'(? 0 and (title_en or title_cn): + empty_p = create_empty_paragraph_v2() + body.insert(insert_pos, empty_p) + insert_pos += 1 + + # 小节标题(只有当标题不为空时才插入) + if title_en or title_cn: + title_paragraphs = create_section_title_two_lines_v2(title_en, title_cn) + for title_p in title_paragraphs: + body.insert(insert_pos, title_p) + insert_pos += 1 + + # 段落内容 + for para in paragraphs: + en_text = para.get('en', '') + if en_text: + p_en = create_formatted_paragraph_v2(en_text, is_chinese=False) + body.insert(insert_pos, p_en) + insert_pos += 1 + + cn_text = para.get('cn', '') + if cn_text: + p_cn = create_formatted_paragraph_v2(cn_text, is_chinese=True) + body.insert(insert_pos, p_cn) + insert_pos += 1 + + print(f" ✓ 已插入 {len(sections)} 个健康评估小节") + + +# ============================================================ +# 主入口函数 +# ============================================================ + +def generate_and_fill_health_assessment_v2(doc, matched_data: dict, api_key: str, call_deepseek_api): + """ + 生成并填充整体健康情况分析(V2版本) + + 这是主入口函数,替代原有的 generate_health_assessment_content + fill_health_assessment_section + """ + if not api_key: + print(" ⚠️ 未提供DeepSeek API Key,跳过健康评估生成") + return None + + print("\n" + "=" * 60) + print("整体健康情况分析 V2") + print("=" * 60) + + # 生成内容 + assessment_result = generate_health_assessment_v2(matched_data, api_key, call_deepseek_api) + + if assessment_result: + # 填充到文档 + print("\n 📝 正在填充健康评估内容...") + fill_health_assessment_v2(doc, assessment_result) + print(" ✓ 整体健康情况分析完成") + else: + print(" ✗ 健康评估生成失败") + + return assessment_result + + +# ============================================================ +# 测试函数 +# ============================================================ + +if __name__ == '__main__': + # 测试prompt构建 + test_normal = [ + {'abb': 'WBC', 'name': '白细胞计数', 'result': '6.5', 'unit': '10^9/L', 'reference': '4.0-10.0', 'system': 'Hematology'}, + {'abb': 'RBC', 'name': '红细胞计数', 'result': '4.8', 'unit': '10^12/L', 'reference': '4.0-5.5', 'system': 'Hematology'}, + ] + + test_abnormal = [ + {'abb': 'TSH', 'name': '促甲状腺激素', 'result': '16.879', 'unit': 'μIU/mL', 'reference': '0.35-4.94', 'point': '↑', 'system': 'Endocrine'}, + {'abb': 'AMH', 'name': '抗缪勒管激素', 'result': '0.17', 'unit': 'ng/mL', 'reference': '1.0-10.0', 'point': '↓', 'system': 'Endocrine'}, + ] + + test_system_data = { + 'Hematology': {'normal': test_normal, 'abnormal': [], 'borderline': []}, + 'Endocrine': {'normal': [], 'abnormal': test_abnormal, 'borderline': []}, + 'Immunology': {'normal': [], 'abnormal': [], 'borderline': []}, + 'Metabolism': {'normal': [], 'abnormal': [], 'borderline': []}, + } + + prompt = build_assessment_prompt(test_normal, test_abnormal, test_system_data) + print("=" * 60) + print("生成的Prompt预览(前2000字符):") + print("=" * 60) + print(prompt[:2000]) + print("...") diff --git a/backend/health_content_generator.py b/backend/health_content_generator.py new file mode 100644 index 0000000..3456a99 --- /dev/null +++ b/backend/health_content_generator.py @@ -0,0 +1,1769 @@ +""" +健康内容生成模块 +包含生成健康评估和建议的函数 +""" +import json +import re +from docx.oxml import OxmlElement +from docx.oxml.ns import qn + + +# 常见的医学ABB缩写列表(用于识别特殊英文缩写) +MEDICAL_ABBS = { + 'HIV', 'WBC', 'RBC', 'HCT', 'MCV', 'MCH', 'MCHC', 'RDW', 'PLT', 'MPV', + 'NEUT', 'LYMPH', 'MONO', 'EOS', 'BAS', 'ESR', 'CRP', 'hs-CRP', + 'TC', 'TG', 'HDL', 'LDL', 'VLDL', 'HbA1C', 'FBS', 'EAG', + 'ALT', 'AST', 'ALP', 'GGT', 'LDH', 'CK', 'CK-MB', + 'BUN', 'Scr', 'UA', 'eGFR', 'Cr', + 'T3', 'T4', 'FT3', 'FT4', 'TSH', 'TgAb', 'TPOAb', + 'IgG', 'IgA', 'IgM', 'IgE', 'C3', 'C4', 'RF', 'ANA', 'ASO', + 'AFP', 'CEA', 'CA125', 'CA19-9', 'CA15-3', 'PSA', 'NSE', 'CYFRA21-1', + 'PT', 'APTT', 'TT', 'INR', 'FIB', 'D-Dimer', + 'Na', 'K', 'Cl', 'Ca', 'Mg', 'P', 'Fe', 'Zn', 'Cu', 'Pb', 'Hg', 'Cd', + 'FSH', 'LH', 'E2', 'PROG', 'PRL', 'DHEAS', 'COR', 'IGF-1', 'AMH', + 'PTH', 'VitD', '25-OH-VD', 'OST', 'TPINP', 'β-CTX', + 'HBsAg', 'HBsAb', 'HBeAg', 'HBeAb', 'HBcAb', 'HCV', 'TRUST', 'TPPA', + 'pH', 'SG', 'PRO', 'GLU', 'KET', 'BIL', 'URO', 'NIT', 'LEU', 'ERY', + 'Hb', 'Fer', 'Hcy', 'ApoB', 'Lp(a)', 'TCO2', 'AG', +} + + +def is_medical_abb(word): + """判断是否是医学ABB缩写""" + # 去除标点 + clean_word = re.sub(r'[,.:;!?()()]', '', word).strip() + if not clean_word: + return False + # 精确匹配 + if clean_word in MEDICAL_ABBS: + return True + # 带%的ABB + if clean_word.endswith('%') and clean_word[:-1] in MEDICAL_ABBS: + return True + # 全大写且长度2-6的可能是ABB + if clean_word.isupper() and 2 <= len(clean_word) <= 6: + return True + # 包含数字的缩写(如CA19-9, HbA1C) + if re.match(r'^[A-Za-z]+[\d-]+[A-Za-z]*\d*$', clean_word): + return True + return False + + +def clean_markdown_formatting(text: str) -> str: + """清理文本中的Markdown格式标记 + + 移除以下Markdown标记: + - **text** 加粗 + - *text* 斜体 + - __text__ 加粗 + - _text_ 斜体 + - `text` 代码 + """ + if not text: + return text + + # 移除 **text** 加粗标记 + text = re.sub(r'\*\*([^*]+)\*\*', r'\1', text) + # 移除 __text__ 加粗标记 + text = re.sub(r'__([^_]+)__', r'\1', text) + # 移除 *text* 斜体标记(注意不要误删单个*) + text = re.sub(r'(? bool: + """判断定性结果是否正常""" + if not result or not reference: + return False + result_lower = result.lower().strip().replace('-', '').replace(' ', '') + reference_lower = reference.lower().strip().replace('-', '').replace(' ', '') + if result_lower == reference_lower: + return True + equivalents = { + 'nonreactive': ['nonreactive', 'non reactive'], + 'negative': ['negative', 'neg'], + 'positive': ['positive', 'pos'], + } + for key, variants in equivalents.items(): + r_match = any(v.replace('-', '').replace(' ', '') == result_lower for v in variants) + ref_match = any(v.replace('-', '').replace(' ', '') == reference_lower for v in variants) + if r_match and ref_match: + return True + return False + + +def collect_all_abnormal_items(matched_data: dict, api_key: str = None) -> list: + """收集所有异常项""" + # 加载配置获取中文项目名称 + from config import load_abb_config, normalize_abb + abb_config = load_abb_config() + abb_to_info = abb_config.get('abb_to_info', {}) + + abnormal_items = [] + for key, data in matched_data.items(): + point = data.get('point', '').strip() + result = data.get('result', '').strip() + reference = data.get('reference', '').strip() + if point in ['↑', '↓', 'H', 'L', '高', '低']: + if is_qualitative_result_normal(result, reference): + continue + + # 优先使用data中的abb字段,其次使用字典key + abb = data.get('abb', key) + + # 获取中文项目名称 + normalized_abb = normalize_abb(abb, abb_config) + info = abb_to_info.get(normalized_abb, {}) + if not info: + info = abb_to_info.get(abb, {}) + if not info: + info = abb_to_info.get(normalized_abb.upper(), {}) + if not info: + info = abb_to_info.get(abb.upper(), {}) + # 优先使用配置文件中的中文名称 + name = info.get('project_cn') or data.get('project_cn') + # 如果没有中文名称,调用DeepSeek翻译 + if not name and api_key: + from extract_and_fill_report import translate_project_name_to_chinese + english_name = info.get('project') or data.get('project', abb) + name = translate_project_name_to_chinese(abb, english_name, api_key) + elif not name: + name = info.get('project') or data.get('project', abb) + + abnormal_items.append({ + 'abb': abb, + 'name': name, + 'result': result, + 'point': point, + 'reference': reference, + 'unit': data.get('unit', ''), + 'module': data.get('module', '') + }) + return abnormal_items + + +def clean_reference_range(reference: str) -> str: + """ + 清理参考范围格式: + 1. 去掉括号 + 2. 将 0: + gridSpan = OxmlElement('w:gridSpan') + gridSpan.set(qn('w:val'), str(merge_cols)) + tcPr.append(gridSpan) + + # 垂直居中 + vAlign = OxmlElement('w:vAlign') + vAlign.set(qn('w:val'), 'center') + tcPr.append(vAlign) + + tc.append(tcPr) + + p = OxmlElement('w:p') + # 段落居中 + pPr = OxmlElement('w:pPr') + jc = OxmlElement('w:jc') + jc.set(qn('w:val'), 'center') + pPr.append(jc) + p.append(pPr) + + if isinstance(lines, list): + for i, line in enumerate(lines): + r = OxmlElement('w:r') + rPr = OxmlElement('w:rPr') + if bold: + b = OxmlElement('w:b') + rPr.append(b) + # 设置字体 + rFonts = OxmlElement('w:rFonts') + rFonts.set(qn('w:ascii'), 'Times New Roman') + rFonts.set(qn('w:eastAsia'), '宋体') + rPr.append(rFonts) + sz = OxmlElement('w:sz') + sz.set(qn('w:val'), '21') # 10.5pt + rPr.append(sz) + r.append(rPr) + t = OxmlElement('w:t') + t.text = str(line) + r.append(t) + p.append(r) + # 添加换行(除了最后一行) + if i < len(lines) - 1: + r_br = OxmlElement('w:r') + br = OxmlElement('w:br') + r_br.append(br) + p.append(r_br) + else: + r = OxmlElement('w:r') + rPr = OxmlElement('w:rPr') + if bold: + b = OxmlElement('w:b') + rPr.append(b) + # 设置字体 + rFonts = OxmlElement('w:rFonts') + rFonts.set(qn('w:ascii'), 'Times New Roman') + rFonts.set(qn('w:eastAsia'), '宋体') + rPr.append(rFonts) + sz = OxmlElement('w:sz') + sz.set(qn('w:val'), '21') # 10.5pt + rPr.append(sz) + r.append(rPr) + t = OxmlElement('w:t') + t.text = str(lines) if lines else '' + r.append(t) + p.append(r) + + tc.append(p) + return tc + + # 列宽数组 + col_widths = [1417, 2268, 1417, 1417, 1701, 1418] + + def create_row(cells_data): + """创建行""" + tr = OxmlElement('w:tr') + for i, cell_data in enumerate(cells_data): + col_w = col_widths[i] if i < len(col_widths) else None + if isinstance(cell_data, tuple): + text, bold, merge = cell_data + tr.append(create_cell_with_lines(text, bold, merge, col_w)) + else: + tr.append(create_cell_with_lines(cell_data, col_width=col_w)) + return tr + + # Row 0: 表头(可选)- 使用列表实现多行 + if include_header: + header_row = create_row([ + (['Abb', '简称'], True, 0), + (['Project', '项目'], True, 0), + (['Result', '结果'], True, 0), + (['Point', '提示'], True, 0), + (['Refer', '参考'], True, 0), + (['Unit', '单位'], True, 0), + ]) + tbl.append(header_row) + + # 数据行 + status = '↑' if item['point'] in ['↑', 'H', '高'] else '↓' + # 清理参考范围格式 + reference = clean_reference_range(item['reference']) + data_row = create_row([ + (item['abb'], True, 0), + (item['name'], True, 0), + (item['result'], False, 0), + (status, False, 0), + (reference, False, 0), + (item['unit'], False, 0), + ]) + tbl.append(data_row) + + # 异常指标汇总表格不需要临床意义行,只显示项目名和数据 + + return tbl + + +def fill_abnormal_index_summary(doc, abnormal_items: list, item_explanations: dict = None): + """ + 填充异常指标汇总表格 + 在 "Abnormal Index异常指标汇总" 标题后插入异常项表格 + 使用与模板相同的表格样式 + + Args: + doc: Word文档对象 + abnormal_items: 异常项列表 + item_explanations: 项目临床意义解释字典 {ABB: {clinical_en: ..., clinical_cn: ...}} + + 步骤: + 1. 找到 Abnormal Index 标题位置 + 2. 删除标题和 Overall Health Assessment 之间的所有表格和段落(占位符) + 3. 插入新的异常项表格 + """ + if not abnormal_items: + print(" 没有异常项目,跳过异常指标汇总") + return + + if item_explanations is None: + item_explanations = {} + + body = doc.element.body + children = list(body) + + # 查找 "Abnormal Index" 标题位置 + abnormal_index_pos = -1 + for i, elem in enumerate(children): + text = ''.join(elem.itertext()).strip().lower() + if 'abnormal index' in text or '异常指标汇总' in text: + abnormal_index_pos = i + print(f" 找到异常指标汇总标题位置: {i}") + break + + if abnormal_index_pos < 0: + print(" 未找到Abnormal Index异常指标汇总位置") + return + + # 查找 "Overall Health Assessment" 位置作为结束边界 + overall_health_pos = -1 + for i, elem in enumerate(children): + text = ''.join(elem.itertext()).strip().lower() + if 'overall health' in text and 'assessment' in text: + overall_health_pos = i + break + + if overall_health_pos < 0: + # 如果找不到,使用一个默认范围 + overall_health_pos = abnormal_index_pos + 50 + + # 删除 Abnormal Index 和 Overall Health Assessment 之间的所有表格和段落 + # 注意:必须从后往前删除,避免索引变化问题 + children = list(body) # 重新获取 + elements_to_remove = [] + for i in range(abnormal_index_pos + 1, min(overall_health_pos, len(children))): + elem = children[i] + # 检查是否是表格元素或段落元素(删除所有内容) + if elem.tag.endswith('}tbl') or elem.tag.endswith('}p'): + elements_to_remove.append(elem) + + # 从后往前删除 + for elem in reversed(elements_to_remove): + try: + body.remove(elem) + except: + pass + + if elements_to_remove: + print(f" 已删除 {len(elements_to_remove)} 个占位符表格") + + # 重新获取children和位置 + children = list(body) + for i, elem in enumerate(children): + text = ''.join(elem.itertext()).strip().lower() + if 'abnormal index' in text or '异常指标汇总' in text: + abnormal_index_pos = i + break + + # 在标题后插入异常项表格 + insert_pos = abnormal_index_pos + 1 + + # 为每个异常项创建表格 + for idx, item in enumerate(abnormal_items): + # 第一个表格包含表头 + include_header = (idx == 0) + + # 获取该项目的临床意义解释 + abb = item.get('abb', '').upper() + explanation = item_explanations.get(abb, {}) + clinical_en = explanation.get('clinical_en', '') + clinical_cn = explanation.get('clinical_cn', '') + + tbl = create_abnormal_item_table_xml(item, idx, include_header, clinical_en, clinical_cn) + body.insert(insert_pos, tbl) + insert_pos += 1 + + # 添加空段落分隔 + p = OxmlElement('w:p') + body.insert(insert_pos, p) + insert_pos += 1 + + # 在异常指标汇总表格后添加分页符(与案例文件格式一致) + # 异常指标汇总应该单独占一页,和 Overall Health Assessment 分开 + page_break_p = OxmlElement('w:p') + page_break_r = OxmlElement('w:r') + page_break_br = OxmlElement('w:br') + page_break_br.set(qn('w:type'), 'page') + page_break_r.append(page_break_br) + page_break_p.append(page_break_r) + body.insert(insert_pos, page_break_p) + + print(f" 已插入异常指标汇总表格,共 {len(abnormal_items)} 个异常项") + + +def generate_health_assessment_content(abnormal_items: list, api_key: str, call_deepseek_api) -> dict: + """调用DeepSeek生成整体健康状况内容""" + if not api_key or not abnormal_items: + return {} + + # 按模块分组异常项 + module_items = {} + for item in abnormal_items: + module = item.get('module', '其他') + if module not in module_items: + module_items[module] = [] + direction = '偏高/High' if item['point'] in ['↑', 'H', '高'] else '偏低/Low' + ref_info = f",参考范围: {item['reference']}" if item.get('reference') else "" + module_items[module].append(f" - {item['name']} ({item['abb']}): {item['result']} {item['unit']} ({direction}){ref_info}") + + # 构建详细的异常指标描述 + abnormal_desc = [] + for module, items in module_items.items(): + abnormal_desc.append(f"【{module}】") + abnormal_desc.extend(items) + + prompt = f"""# 角色设定 +你是一位经验丰富的医疗专家,在大健康、抗衰老行业已经深耕多年,是全世界知名的功能医学专家、全科专家、荷尔蒙专家,在梅奥诊所等多家欧美私立医疗机构任职医疗院长。 + +# 任务 +根据以下患者的血液检查报告异常指标,撰写"整体健康情况"分析报告。 + +# 异常指标汇总 +{chr(10).join(abnormal_desc)} + +# 写作要求 + +## 1. 整体结构 +报告分为两部分: +- **第一部分**:整体健康情况概述(约150字,中英文各150字) +- **第二部分**:四个专项板块的详细分析 + +## 2. 整体健康情况概述(第一个section) +- 标题:Overall Health Overview / 整体健康概述 +- 内容:从功能医学角度,综合评估患者的整体健康状态 +- 字数:英文约150词,中文约150字 +- 要点:概括主要发现、整体健康水平、需要关注的重点领域 + +## 3. 四个专项板块 +必须严格按照以下顺序: +- (I) Hematology and Inflammatory Status / (一)血液学与炎症状态 +- (II) Hormonal and Endocrine Regulation / (二)荷尔蒙与内分泌调节 +- (III) Immunology and Infection Risk / (三)免疫学与感染风险 +- (IV) Nutrition and Metabolic Profile / (四)营养与代谢状况 + +## 4. 每个专项板块的内容结构(总分结构) +每个板块包含2-3个段落: +- **第1段(总述)**:该系统的整体健康状态评估(约100字) +- **第2-3段(分述)**:具体异常指标的详细分析,必须包含: + - 指标名称和具体数值 + - 参考范围对比 + - 临床意义解读 + - 可能的原因分析 + - 每段约100字 + +## 5. 写作风格 +- 从功能医学与整体健康角度进行专业分析 +- 不要遗漏任何一个异常指标 +- 语言专业但易于理解 +- 体现功能医学"未病先防"的理念 +- 每段英文和中文内容要对应 + +## 6. 重要提示 +- 如果某个板块没有相关异常指标,仍需撰写该板块,说明该系统指标正常 +- 必须在分析中引用具体的检测数值和参考范围 + +# 输出格式(JSON) +```json +{{ + "sections": [ + {{ + "title_en": "Overall Health Overview", + "title_cn": "整体健康概述", + "paragraphs": [ + {{"en": "约150词的整体健康概述...", "cn": "约150字的整体健康概述..."}} + ] + }}, + {{ + "title_en": "(I) Hematology and Inflammatory Status", + "title_cn": "(一)血液学与炎症状态", + "paragraphs": [ + {{"en": "总述:该系统整体状态(约100词)...", "cn": "总述:该系统整体状态(约100字)..."}}, + {{"en": "分述:具体指标分析,包含数值和参考范围(约100词)...", "cn": "分述:具体指标分析,包含数值和参考范围(约100字)..."}} + ] + }}, + {{ + "title_en": "(II) Hormonal and Endocrine Regulation", + "title_cn": "(二)荷尔蒙与内分泌调节", + "paragraphs": [ + {{"en": "总述...", "cn": "总述..."}}, + {{"en": "分述...", "cn": "分述..."}} + ] + }}, + {{ + "title_en": "(III) Immunology and Infection Risk", + "title_cn": "(三)免疫学与感染风险", + "paragraphs": [...] + }}, + {{ + "title_en": "(IV) Nutrition and Metabolic Profile", + "title_cn": "(四)营养与代谢状况", + "paragraphs": [...] + }} + ] +}} +``` + +只返回JSON,不要其他内容。""" + + def parse_json_response(response_text): + """解析JSON响应,带容错处理""" + # 提取JSON部分 + if '```json' in response_text: + response_text = response_text.split('```json')[1].split('```')[0] + elif '```' in response_text: + response_text = response_text.split('```')[1].split('```')[0] + + response_text = response_text.strip() + + # 尝试直接解析 + try: + return json.loads(response_text) + except json.JSONDecodeError: + pass + + # 尝试修复常见问题 + # 1. 修复未闭合的字符串(在最后添加引号和括号) + if response_text.count('"') % 2 != 0: + response_text += '"' + + # 2. 尝试补全JSON结构 + open_braces = response_text.count('{') - response_text.count('}') + open_brackets = response_text.count('[') - response_text.count(']') + + if open_brackets > 0: + # 检查是否需要先闭合对象 + if open_braces > 0: + response_text += '}' * open_braces + response_text += ']' * open_brackets + elif open_braces > 0: + response_text += '}' * open_braces + + try: + return json.loads(response_text) + except json.JSONDecodeError: + pass + + # 3. 尝试提取部分有效的sections + import re + sections = [] + section_pattern = r'\{\s*"title_en"\s*:\s*"([^"]+)"\s*,\s*"title_cn"\s*:\s*"([^"]+)"\s*,\s*"paragraphs"\s*:\s*\[\s*\{\s*"en"\s*:\s*"([^"]*(?:[^"\\]|\\.)*?)"\s*,\s*"cn"\s*:\s*"([^"]*(?:[^"\\]|\\.)*?)"\s*\}' + matches = re.findall(section_pattern, response_text, re.DOTALL) + + for match in matches: + sections.append({ + "title_en": match[0], + "title_cn": match[1], + "paragraphs": [{"en": match[2], "cn": match[3]}] + }) + + if sections: + return {"sections": sections} + + return None + + # 最多重试3次 + for attempt in range(3): + try: + # 使用更大的max_tokens和更长的超时时间生成长文本 + response = call_deepseek_api(prompt, api_key, max_tokens=4000, timeout=180) + # 检查API是否返回None(超时或错误) + if response is None: + if attempt < 2: + print(f" ⚠️ 第{attempt+1}次API请求失败(超时或错误),重试中...") + import time + time.sleep(3) # 等待3秒后重试 + continue + result = parse_json_response(response) + if result and result.get('sections'): + print(f" ✓ 成功生成 {len(result['sections'])} 个健康评估板块") + return result + if attempt < 2: + print(f" ⚠️ 第{attempt+1}次生成格式不完整,重试中...") + except Exception as e: + if attempt < 2: + print(f" ⚠️ 第{attempt+1}次生成失败: {e},重试中...") + + print(f" 生成健康评估失败: 多次尝试后仍无法获取有效响应") + return {} + + +def generate_functional_health_advice(abnormal_items: list, api_key: str, call_deepseek_api) -> dict: + """调用DeepSeek生成功能医学健康建议内容(新版5模块结构化格式)""" + if not api_key or not abnormal_items: + return {} + + # 按模块分组异常项 + module_items = {} + for item in abnormal_items: + module = item.get('module', '其他') + if module not in module_items: + module_items[module] = [] + direction = '偏高/High' if item['point'] in ['↑', 'H', '高'] else '偏低/Low' + ref_info = f",参考范围: {item['reference']}" if item.get('reference') else "" + module_items[module].append(f" - {item['name']} ({item['abb']}): {item['result']} {item['unit']} ({direction}){ref_info}") + + # 构建详细的异常指标描述 + abnormal_desc = [] + for module, items in module_items.items(): + abnormal_desc.append(f"【{module}】") + abnormal_desc.extend(items) + + prompt = f"""# 角色设定 +你是Be.U Med功能医学团队的资深健康管理顾问,在功能医学、营养医学、运动医学、睡眠医学及生活方式干预领域具有丰富的临床经验。 + +# 任务 +根据以下患者的血液检查报告异常指标,撰写"功能医学健康建议"方案。该方案位于「医学干预」建议方案之后,侧重于日常可执行的健康管理策略。 + +# 异常指标汇总 +{chr(10).join(abnormal_desc)} + +# 核心原则 +- 全篇禁止出现任何检验指标的具体数值、参考区间、单位、百分号或数字(0-9) +- 可以提及指标名称(如"黄体酮偏低""ESR升高"),但不要写具体数值 +- 每个段落必须先写英文,再写对应的中文,不要混排 +- 严禁使用不确定表述(may, might, could, 可能, 也许, 似乎等) +- 使用肯定表述:建议、支持、助力、优化、需要、应当 + +# 五个模块(必须严格按顺序) +1. Nutrition Intervention 营养干预 +2. Exercise Intervention 运动干预 +3. Sleep & Stress Management 睡眠与压力管理 +4. Lifestyle Adjustment 生活方式调整 +5. Long-term Follow-up Plan 长期随访计划 + +# 每个模块的内容结构 +- overview: 领域概述(英文约100词,中文约120字)—— 该领域在功能医学中的重要性 +- analysis: 检测关联分析(英文约80词,中文约100字)—— 结合异常指标说明干预必要性(不写数值) +- recommendations: 3-5条具体建议,每条包含英文(30-50词)和中文(50-80字),要具体可执行 +- summary: 总结意义(英文约80词,中文约100字)—— 该干预的整体价值和协同作用 + +# 输出格式(JSON) +```json +{{ + "sections": [ + {{ + "title_en": "Nutrition Intervention", + "title_cn": "营养干预", + "overview": {{ + "en": "英文领域概述(约100词)...", + "cn": "中文领域概述(约120字)..." + }}, + "analysis": {{ + "en": "英文检测关联分析(约80词,不写数值)...", + "cn": "中文检测关联分析(约100字)..." + }}, + "recommendations": [ + {{"en": "Supplementation of B vitamins, folate, and iron to support hematopoiesis and cellular energy production.", "cn": "补充维生素B族、叶酸和铁,以支持造血功能和细胞能量产生;"}}, + {{"en": "Adequate protein and healthy fats with cofactors to support hormonal balance.", "cn": "摄入足够的优质蛋白和健康脂肪(并配合锌、硒、镁等辅因子),以维持荷尔蒙平衡;"}}, + {{"en": "Anti-inflammatory nutrients (omega-3, vitamins C/E, polyphenols) to reduce inflammation.", "cn": "用抗炎营养素(如ω-3、维生素C/E、多酚类)降低炎症保护肠道健康。"}} + ], + "summary": {{ + "en": "英文总结意义(约80词)...", + "cn": "中文总结意义(约100字)..." + }} + }}, + {{ + "title_en": "Exercise Intervention", + "title_cn": "运动干预", + "overview": {{"en": "...", "cn": "..."}}, + "analysis": {{"en": "...", "cn": "..."}}, + "recommendations": [...], + "summary": {{"en": "...", "cn": "..."}} + }}, + {{ + "title_en": "Sleep & Stress Management", + "title_cn": "睡眠与压力管理", + "overview": {{"en": "...", "cn": "..."}}, + "analysis": {{"en": "...", "cn": "..."}}, + "recommendations": [...], + "summary": {{"en": "...", "cn": "..."}} + }}, + {{ + "title_en": "Lifestyle Adjustment", + "title_cn": "生活方式调整", + "overview": {{"en": "...", "cn": "..."}}, + "analysis": {{"en": "...", "cn": "..."}}, + "recommendations": [...], + "summary": {{"en": "...", "cn": "..."}} + }}, + {{ + "title_en": "Long-term Follow-up Plan", + "title_cn": "长期随访计划", + "overview": {{"en": "...", "cn": "..."}}, + "analysis": {{"en": "...", "cn": "..."}}, + "recommendations": [...], + "summary": {{"en": "...", "cn": "..."}} + }} + ] +}} +``` + +只返回JSON,不要其他内容。请确保每个板块都有完整的内容结构。""" + + def parse_json_response(response_text): + """解析JSON响应,带容错处理""" + # 提取JSON部分 + if '```json' in response_text: + response_text = response_text.split('```json')[1].split('```')[0] + elif '```' in response_text: + response_text = response_text.split('```')[1].split('```')[0] + + response_text = response_text.strip() + + # 尝试直接解析 + try: + return json.loads(response_text) + except json.JSONDecodeError: + pass + + # 尝试修复常见问题 + if response_text.count('"') % 2 != 0: + response_text += '"' + + open_braces = response_text.count('{') - response_text.count('}') + open_brackets = response_text.count('[') - response_text.count(']') + + if open_brackets > 0: + if open_braces > 0: + response_text += '}' * open_braces + response_text += ']' * open_brackets + elif open_braces > 0: + response_text += '}' * open_braces + + try: + return json.loads(response_text) + except json.JSONDecodeError: + pass + + # 尝试提取部分有效的sections + import re + sections = [] + section_pattern = r'\{\s*"title_en"\s*:\s*"([^"]+)"\s*,\s*"title_cn"\s*:\s*"([^"]+)"\s*,\s*"paragraphs"\s*:\s*\[\s*\{\s*"en"\s*:\s*"([^"]*(?:[^"\\]|\\.)*?)"\s*,\s*"cn"\s*:\s*"([^"]*(?:[^"\\]|\\.)*?)"\s*\}' + matches = re.findall(section_pattern, response_text, re.DOTALL) + + for match in matches: + sections.append({ + "title_en": match[0], + "title_cn": match[1], + "paragraphs": [{"en": match[2], "cn": match[3]}] + }) + + if sections: + return {"sections": sections} + + return None + + # 最多重试3次 + for attempt in range(3): + try: + # 使用更大的max_tokens和更长的超时时间生成长文本(内容较多,需要8000 tokens) + response = call_deepseek_api(prompt, api_key, max_tokens=8000, timeout=300) + # 检查API是否返回None(超时或错误) + if response is None: + if attempt < 2: + print(f" ⚠️ 第{attempt+1}次API请求失败(超时或错误),重试中...") + import time + time.sleep(3) # 等待3秒后重试 + continue + result = parse_json_response(response) + if result and result.get('sections'): + # 检查每个section的内容数量(兼容新旧格式) + total_items = sum( + len(s.get('recommendations', [])) or len(s.get('paragraphs', [])) + for s in result['sections'] + ) + print(f" ✓ 成功生成 {len(result['sections'])} 个功能医学建议板块,共 {total_items} 个内容项") + return result + if attempt < 2: + print(f" ⚠️ 第{attempt+1}次生成格式不完整,重试中...") + except Exception as e: + if attempt < 2: + print(f" ⚠️ 第{attempt+1}次生成失败: {e},重试中...") + + print(f" 生成功能医学健康建议失败: 多次尝试后仍无法获取有效响应") + return {} + + +def fill_health_assessment_section(doc, assessment_content: dict): + """将健康评估内容填充到文档的Overall Health Assessment区域 + + 策略: + 1. 找到 "Overall Health Assessment" 标题位置 + 2. 删除标题之后、下一个主要区域之前的所有模板内容(不保护介绍段落) + 3. 在标题之后直接插入DeepSeek生成的内容(包含五个模块表头和内容) + + 格式参考模板: + - 区域标题后:1个空段落 + - 每个模块标题前:1个空段落 + """ + sections = assessment_content.get('sections', []) + if not sections: + print(" 健康评估内容为空,跳过填充") + return + + body = doc.element.body + children = list(body) + + # 查找 "Overall Health Assessment" 位置 + overall_start = -1 + for i, elem in enumerate(children): + text = ''.join(elem.itertext()).strip().lower() + if 'overall health' in text and 'assessment' in text: + overall_start = i + print(f" 找到Overall Health Assessment位置: {i}") + break + + if overall_start < 0: + print(" 未找到Overall Health Assessment位置") + return + + # 找到下一个主要区域的位置(作为删除的结束边界) + # 可能是:Medical Intervention、Functional Medical Health Advice + next_section_pos = len(children) + end_keywords = ['medical intervention', '医学干预', + 'functional medical health advice', '功能医学健康建议'] + + for i in range(overall_start + 1, len(children)): + text = ''.join(children[i].itertext()).strip().lower() + if any(kw in text for kw in end_keywords): + next_section_pos = i + print(f" 找到下一区域位置: {i}") + break + + # 删除标题之后、下一区域之前的所有模板内容(包括分页符,因为Medical Intervention前不需要分页符) + children = list(body) # 重新获取 + elements_to_remove = [] + for i in range(overall_start + 1, min(next_section_pos, len(children))): + elem = children[i] + # 跳过 sectPr 元素 + if elem.tag.endswith('}sectPr'): + continue + # 不再保留分页符,Medical Intervention前不需要分页符 + elements_to_remove.append(elem) + + for elem in elements_to_remove: + try: + body.remove(elem) + except: + pass + + if elements_to_remove: + print(f" 已删除 {len(elements_to_remove)} 个模板占位内容") + + # 重新获取位置(因为删除了元素) + children = list(body) + insert_pos = -1 + for i, elem in enumerate(children): + text = ''.join(elem.itertext()).strip().lower() + if 'overall health' in text and 'assessment' in text: + insert_pos = i + 1 + break + + if insert_pos < 0: + print(" 无法确定插入位置") + return + + # 插入新生成的内容(五个模块表头 + 内容全部由DeepSeek生成) + for idx, section in enumerate(sections): + title_en = section.get('title_en', '') + title_cn = section.get('title_cn', '') + paragraphs = section.get('paragraphs', []) + + # 只在第二个及之后的模块标题前插入空段落(第一个模块紧跟区域标题,不需要空段落) + if idx > 0: + empty_p = create_empty_paragraph() + body.insert(insert_pos, empty_p) + insert_pos += 1 + + # 插入小节标题(两行格式:英文一行,中文一行)- 参考模板 Overall Health Assessment 格式 + title_paragraphs = create_section_title_two_lines(title_en, title_cn) + for title_p in title_paragraphs: + body.insert(insert_pos, title_p) + insert_pos += 1 + + # 插入段落内容(全部由DeepSeek生成) + for para in paragraphs: + # 英文段落 + en_text = para.get('en', '') + if en_text: + p_en = create_formatted_paragraph(en_text, is_chinese=False) + body.insert(insert_pos, p_en) + insert_pos += 1 + + # 中文段落 + cn_text = para.get('cn', '') + if cn_text: + p_cn = create_formatted_paragraph(cn_text, is_chinese=True) + body.insert(insert_pos, p_cn) + insert_pos += 1 + + print(f" 已插入 {len(sections)} 个健康评估小节") + + +def create_page_break_paragraph(): + """创建包含分页符的段落""" + p = OxmlElement('w:p') + r = OxmlElement('w:r') + br = OxmlElement('w:br') + br.set(qn('w:type'), 'page') + r.append(br) + p.append(r) + return p + + +def fill_functional_health_advice_section(doc, advice_content: dict): + """将功能医学健康建议内容填充到文档 + + 策略: + 1. 找到 "Functional Medical Health Advice" 标题位置 + 2. 在标题前插入分页符 + 3. 如果找不到,在 "Medical Intervention" 区域结束后创建该区域 + 4. 删除标题之后、下一个主要区域之前的所有模板内容(不保护介绍段落) + 5. 在标题之后直接插入DeepSeek生成的内容(包含五个模块表头和内容) + """ + sections = advice_content.get('sections', []) + if not sections: + print(" 功能医学健康建议内容为空,跳过填充") + return + + body = doc.element.body + children = list(body) + + # 查找 "Functional Medical Health Advice" 位置 + advice_start = -1 + for i, elem in enumerate(children): + text = ''.join(elem.itertext()).strip() + text_lower = text.lower() + if ('functional medical health advice' in text_lower or + '功能医学健康建议' in text or + ('functional' in text_lower and 'medical' in text_lower and 'health' in text_lower and 'advice' in text_lower)): + advice_start = i + print(f" 找到Functional Medical Health Advice位置: {i}") + break + + if advice_start < 0: + # 尝试更宽松的匹配 + for i, elem in enumerate(children): + text = ''.join(elem.itertext()).strip() + text_lower = text.lower() + if 'functional' in text_lower and 'advice' in text_lower: + advice_start = i + print(f" 找到功能医学建议位置(宽松匹配): {i}") + break + + # 在FHA标题前确保有分页符 + if advice_start >= 0: + already_has_break = False + if advice_start > 0: + prev_elem = children[advice_start - 1] + br_elem = prev_elem.find('.//{http://schemas.openxmlformats.org/wordprocessingml/2006/main}br') + if br_elem is not None: + break_type = br_elem.get('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}type') + if break_type == 'page': + already_has_break = True + if not already_has_break: + page_break = create_page_break_paragraph() + body.insert(advice_start, page_break) + advice_start += 1 # 标题位置偏移 + print(f" 已在FHA标题前插入分页符") + # 重新获取children + children = list(body) + + # 重建 FHA 标题:单行格式 + 4二级-标题样式(与整体健康情况标题格式一致:华文楷体、四号、加粗) + if advice_start >= 0: + old_title = children[advice_start] + # 创建新的单行标题段落 + new_title = OxmlElement('w:p') + new_pPr = OxmlElement('w:pPr') + new_pStyle = OxmlElement('w:pStyle') + new_pStyle.set(qn('w:val'), '4-') + new_pPr.append(new_pStyle) + new_title.append(new_pPr) + new_r = OxmlElement('w:r') + new_t = OxmlElement('w:t') + new_t.set('{http://www.w3.org/XML/1998/namespace}space', 'preserve') + new_t.text = 'Functional Medical Health Advice 功能医学健康建议' + new_r.append(new_t) + new_title.append(new_r) + # 替换模板原标题 + old_title.addprevious(new_title) + body.remove(old_title) + children = list(body) + # 更新 advice_start 位置 + for i, elem in enumerate(children): + if elem is new_title: + advice_start = i + break + print(f" 已重建FHA标题为单行格式 + 4二级-标题样式") + + if advice_start < 0: + # 如果找不到,需要创建该区域 + print(" 未找到Functional Medical Health Advice位置,尝试创建...") + + insert_after_pos = -1 + + # 首先找到 "Medical Intervention" 的位置 + medical_intervention_pos = -1 + for i, elem in enumerate(children): + text = ''.join(elem.itertext()).strip().lower() + if 'medical intervention' in text or '医学干预' in text: + medical_intervention_pos = i + print(f" 找到Medical Intervention位置: {i}") + break + + if medical_intervention_pos >= 0: + # 从 Medical Intervention 之后找到该区域的结束位置 + end_keywords = ['功能医学检测档案', 'functional medical examination', + '客户功能医学检测档案', '尿液检测', 'urine detection', 'urine test'] + + for i in range(medical_intervention_pos + 1, len(children)): + text = ''.join(children[i].itertext()).strip().lower() + if any(kw in text for kw in end_keywords): + insert_after_pos = i + print(f" 找到Medical Intervention结束位置: {i}") + break + + # 如果没找到 Medical Intervention,尝试找 Overall Health Assessment 之后的位置 + if insert_after_pos < 0: + overall_health_pos = -1 + for i, elem in enumerate(children): + text = ''.join(elem.itertext()).strip().lower() + if 'overall health' in text and 'assessment' in text: + overall_health_pos = i + break + + if overall_health_pos >= 0: + end_keywords = ['功能医学检测档案', 'functional medical examination', + '客户功能医学检测档案', '尿液检测', 'urine detection', 'urine test'] + for i in range(overall_health_pos + 1, len(children)): + text = ''.join(children[i].itertext()).strip().lower() + if any(kw in text for kw in end_keywords): + insert_after_pos = i + print(f" 找到插入位置(Overall Health之后): {i}") + break + + if insert_after_pos < 0: + print(" 无法确定插入位置,跳过功能医学健康建议") + return + + # 在FHA区域前插入分页符 + page_break = create_page_break_paragraph() + body.insert(insert_after_pos, page_break) + insert_after_pos += 1 + print(f" 已在FHA区域前插入分页符") + + # 创建标题(使用 4二级-标题 样式,与整体健康情况标题格式一致:华文楷体、四号、加粗) + print(f" 在位置 {insert_after_pos} 创建Functional Medical Health Advice区域") + + title_text = "Functional Medical Health Advice 功能医学健康建议" + title_p = OxmlElement('w:p') + title_pPr = OxmlElement('w:pPr') + title_pStyle = OxmlElement('w:pStyle') + title_pStyle.set(qn('w:val'), '4-') + title_pPr.append(title_pStyle) + title_p.append(title_pPr) + title_r = OxmlElement('w:r') + title_t = OxmlElement('w:t') + title_t.set('{http://www.w3.org/XML/1998/namespace}space', 'preserve') + title_t.text = title_text + title_r.append(title_t) + title_p.append(title_r) + body.insert(insert_after_pos, title_p) + insert_pos = insert_after_pos + 1 + + # 标题后插入1个空段落(参考模板格式) + empty_p = create_empty_paragraph() + body.insert(insert_pos, empty_p) + insert_pos += 1 + + # 直接插入DeepSeek生成的内容(五个模块表头 + 内容) + for idx, section in enumerate(sections): + title_en = section.get('title_en', '') + title_cn = section.get('title_cn', '') + + # 在每个模块标题前插入1个空段落(参考模板格式) + empty_p = create_empty_paragraph() + body.insert(insert_pos, empty_p) + insert_pos += 1 + + # 插入小节标题(单行格式:英文 + 中文)- 参考模板 Functional Medical Health Advice 格式 + section_title_p = create_section_title_one_line(title_en, title_cn) + body.insert(insert_pos, section_title_p) + insert_pos += 1 + + # 支持新结构化格式(overview/analysis/recommendations/summary)和旧格式(paragraphs) + if section.get('overview'): + # 新结构化格式 + overview = section['overview'] + if overview.get('en'): + body.insert(insert_pos, create_formatted_paragraph(overview['en'], is_chinese=False)) + insert_pos += 1 + if overview.get('cn'): + body.insert(insert_pos, create_formatted_paragraph(overview['cn'], is_chinese=True)) + insert_pos += 1 + + analysis = section.get('analysis', {}) + if analysis.get('en'): + body.insert(insert_pos, create_formatted_paragraph(analysis['en'], is_chinese=False)) + insert_pos += 1 + if analysis.get('cn'): + body.insert(insert_pos, create_formatted_paragraph(analysis['cn'], is_chinese=True)) + insert_pos += 1 + + body.insert(insert_pos, create_formatted_paragraph("Recommended strategies include:", is_chinese=False)) + insert_pos += 1 + body.insert(insert_pos, create_formatted_paragraph("建议措施包括:", is_chinese=True)) + insert_pos += 1 + + for rec in section.get('recommendations', []): + if rec.get('en'): + body.insert(insert_pos, create_formatted_paragraph(rec['en'], is_chinese=False)) + insert_pos += 1 + if rec.get('cn'): + body.insert(insert_pos, create_formatted_paragraph(rec['cn'], is_chinese=True)) + insert_pos += 1 + + summary = section.get('summary', {}) + if summary.get('en'): + body.insert(insert_pos, create_formatted_paragraph(summary['en'], is_chinese=False)) + insert_pos += 1 + if summary.get('cn'): + body.insert(insert_pos, create_formatted_paragraph(summary['cn'], is_chinese=True)) + insert_pos += 1 + else: + # 旧格式(paragraphs数组) + for para in section.get('paragraphs', []): + en_text = para.get('en', '') + if en_text: + p_en = create_formatted_paragraph(en_text, is_chinese=False) + body.insert(insert_pos, p_en) + insert_pos += 1 + cn_text = para.get('cn', '') + if cn_text: + p_cn = create_formatted_paragraph(cn_text, is_chinese=True) + body.insert(insert_pos, p_cn) + insert_pos += 1 + + print(f" 已创建并插入 {len(sections)} 个功能医学健康建议小节") + return + + # 找到下一个主要区域的位置(作为删除的结束边界) + children = list(body) # 重新获取 + next_section_pos = len(children) + end_keywords = ['功能医学检测档案', 'functional medical examination file', + '尿液检测', 'urine detection', + '客户功能医学检测档案', 'client functional medical'] + + for i in range(advice_start + 1, len(children)): + text = ''.join(children[i].itertext()).strip().lower() + if any(kw in text for kw in end_keywords): + next_section_pos = i + print(f" 找到下一区域位置: {i}") + break + + # 删除标题之后、下一区域之前的所有模板内容(不再保护介绍段落) + children = list(body) # 重新获取 + elements_to_remove = [] + for i in range(advice_start + 1, min(next_section_pos, len(children))): + elem = children[i] + # 跳过 sectPr 元素 + if not elem.tag.endswith('}sectPr'): + elements_to_remove.append(elem) + + for elem in elements_to_remove: + try: + body.remove(elem) + except: + pass + + if elements_to_remove: + print(f" 已删除 {len(elements_to_remove)} 个模板占位内容") + + # 重新获取位置(因为删除了元素) + children = list(body) + insert_pos = -1 + for i, elem in enumerate(children): + text = ''.join(elem.itertext()).strip().lower() + if 'functional medical health advice' in text or '功能医学健康建议' in text: + insert_pos = i + 1 + break + + if insert_pos < 0: + print(" 无法确定插入位置") + return + + # 插入固定总述引导段落 + intro_paragraphs = [ + { + "en": "Functional medicine goes beyond the diagnosis and medical treatment of diseases, placing greater emphasis on comprehensive health management for each individual. Beyond the aforementioned \"medical intervention\", the core of functional medicine lies in helping individuals improve their lifestyle from the root, optimize bodily functions, and enhance overall health. Through a comprehensive assessment of metabolism, immunity, hormones, nutrition, emotions, and daily habits, a personalized health optimization pathway can be tailored for each client.", + "cn": "功能医学不仅仅停留在疾病的诊断与医学治疗,更强调对个体的全方位健康管理。在上述「医学干预」之外,功能医学的核心在于帮助人们从源头改善生活方式、优化身体功能与提升整体健康状态。通过对代谢、免疫、荷尔蒙、营养、情绪及生活习惯等多个维度的综合评估,可以为每一位客户量身定制个性化的健康提升路径。" + }, + { + "en": "Based on your test results and individual health status, the Be.U Med Functional Medicine Team provides you with scientific and actionable recommendations in the areas of nutrition adjustment, exercise prescription, sleep and stress management, and lifestyle optimization, aiming to help you achieve long-term health, chronic disease prevention, and overall well-being.", + "cn": "基于您的检测结果与个人健康状况,Be.U Med功能医学团队从营养调节、运动处方、睡眠与压力管理、生活方式优化等方面,为您提出科学、可执行的健康建议,旨在帮助您实现真正的长期健康、慢病预防与身心平衡。" + } + ] + + for intro_para in intro_paragraphs: + en_text = intro_para.get('en', '') + if en_text: + p_en = create_formatted_paragraph(en_text, is_chinese=False) + body.insert(insert_pos, p_en) + insert_pos += 1 + cn_text = intro_para.get('cn', '') + if cn_text: + p_cn = create_formatted_paragraph(cn_text, is_chinese=True) + body.insert(insert_pos, p_cn) + insert_pos += 1 + + # 插入五个模块内容 + for idx, section in enumerate(sections): + title_en = section.get('title_en', '') + title_cn = section.get('title_cn', '') + + # 在每个模块标题前插入1个空段落 + empty_p = create_empty_paragraph() + body.insert(insert_pos, empty_p) + insert_pos += 1 + + # 插入小节标题(单行格式:英文 + 中文) + title_p = create_section_title_one_line(title_en, title_cn) + body.insert(insert_pos, title_p) + insert_pos += 1 + + # 支持新结构化格式(overview/analysis/recommendations/summary)和旧格式(paragraphs) + if section.get('overview'): + # 新结构化格式 + # 领域概述 + overview = section['overview'] + if overview.get('en'): + body.insert(insert_pos, create_formatted_paragraph(overview['en'], is_chinese=False)) + insert_pos += 1 + if overview.get('cn'): + body.insert(insert_pos, create_formatted_paragraph(overview['cn'], is_chinese=True)) + insert_pos += 1 + + # 检测关联分析 + analysis = section.get('analysis', {}) + if analysis.get('en'): + body.insert(insert_pos, create_formatted_paragraph(analysis['en'], is_chinese=False)) + insert_pos += 1 + if analysis.get('cn'): + body.insert(insert_pos, create_formatted_paragraph(analysis['cn'], is_chinese=True)) + insert_pos += 1 + + # 建议引导语 + body.insert(insert_pos, create_formatted_paragraph("Recommended strategies include:", is_chinese=False)) + insert_pos += 1 + body.insert(insert_pos, create_formatted_paragraph("建议措施包括:", is_chinese=True)) + insert_pos += 1 + + # 具体建议 + for rec in section.get('recommendations', []): + if rec.get('en'): + body.insert(insert_pos, create_formatted_paragraph(rec['en'], is_chinese=False)) + insert_pos += 1 + if rec.get('cn'): + body.insert(insert_pos, create_formatted_paragraph(rec['cn'], is_chinese=True)) + insert_pos += 1 + + # 总结意义 + summary = section.get('summary', {}) + if summary.get('en'): + body.insert(insert_pos, create_formatted_paragraph(summary['en'], is_chinese=False)) + insert_pos += 1 + if summary.get('cn'): + body.insert(insert_pos, create_formatted_paragraph(summary['cn'], is_chinese=True)) + insert_pos += 1 + else: + # 旧格式(paragraphs数组) + for para in section.get('paragraphs', []): + en_text = para.get('en', '') + if en_text: + body.insert(insert_pos, create_formatted_paragraph(en_text, is_chinese=False)) + insert_pos += 1 + cn_text = para.get('cn', '') + if cn_text: + body.insert(insert_pos, create_formatted_paragraph(cn_text, is_chinese=True)) + insert_pos += 1 + + print(f" 已插入 {len(sections)} 个功能医学健康建议小节") + + # 验证并确保"Functional Medical Health Advice"前有分页符 + children = list(body) + advice_pos = -1 + for i, elem in enumerate(children): + text = ''.join(elem.itertext()).strip() + if 'Functional Medical Health Advice' in text or '功能医学健康建议' in text: + advice_pos = i + break + + # 不再验证和插入Functional Medical Health Advice前的分页符 + # 医学干预建议方案前不需要分页符 + + # 在"客户功能医学检测档案"前插入分页符(如果还没有的话) + children = list(body) + for i, elem in enumerate(children): + text = ''.join(elem.itertext()).strip() + if '功能医学检测档案' in text or 'Functional Medical Examination File' in text: + # 检查前一个元素是否已经是分页符 + already_has_break = False + if i > 0: + prev_elem = children[i - 1] + br_elem = prev_elem.find('.//{http://schemas.openxmlformats.org/wordprocessingml/2006/main}br') + if br_elem is not None: + break_type = br_elem.get('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}type') + if break_type == 'page': + already_has_break = True + + if not already_has_break: + # 在该元素前插入分页符 + page_break = create_page_break_paragraph() + body.insert(i, page_break) + print(f" 已在'客户功能医学检测档案'前插入分页符") + else: + print(f" '客户功能医学检测档案'前已有分页符,跳过插入") + break + + +def generate_item_explanations(abnormal_items: list, api_key: str, call_deepseek_api) -> dict: + """ + 为异常项获取临床意义解释 + 优先使用模板中的解释,只有模板中没有的才调用DeepSeek生成 + + Args: + abnormal_items: 异常项列表 + api_key: DeepSeek API Key + call_deepseek_api: API调用函数 + + Returns: + {ABB: {clinical_en: ..., clinical_cn: ...}, ...} + """ + from pathlib import Path + + result = {} + items_need_generation = [] + + # 1. 首先尝试从模板解释文件获取 + template_explanations_file = Path(__file__).parent / "template_explanations.json" + template_explanations = {} + + if template_explanations_file.exists(): + try: + with open(template_explanations_file, 'r', encoding='utf-8') as f: + template_explanations = json.load(f) + print(f" ✓ 已加载 {len(template_explanations)} 个模板解释") + except Exception as e: + print(f" ⚠️ 加载模板解释失败: {e}") + + # 2. 检查每个异常项是否有模板解释 + for item in abnormal_items: + abb = item.get('abb', '').upper().strip() + if not abb: + continue + + # 尝试多种匹配方式 + found = False + + # 直接匹配 + if abb in template_explanations: + exp = template_explanations[abb] + if exp.get('clinical_en') and exp.get('clinical_cn'): + result[abb] = exp + found = True + + # 去除特殊字符后匹配 + if not found: + abb_clean = ''.join(c for c in abb if c.isalnum()) + for key, value in template_explanations.items(): + key_clean = ''.join(c for c in key.upper() if c.isalnum()) + if abb_clean == key_clean: + if value.get('clinical_en') and value.get('clinical_cn'): + result[abb] = value + found = True + break + + if not found: + items_need_generation.append(item) + + template_count = len(result) + print(f" ✓ 模板解释: {template_count} 个项目") + + # 3. 如果有需要生成的项目,调用DeepSeek + if items_need_generation and api_key: + print(f" ⏳ 需要DeepSeek生成: {len(items_need_generation)} 个项目") + + # 构建项目描述 + items_desc = [] + for item in items_need_generation: + direction = '偏高' if item['point'] in ['↑', 'H', '高'] else '偏低' + desc = f"- {item['abb']}: {item['name']}, 结果: {item['result']}" + if item.get('unit'): + desc += f" {item['unit']}" + desc += f" ({direction})" + if item.get('reference'): + desc += f", 参考范围: {item['reference']}" + items_desc.append(desc) + + prompt = f"""你是一位医学检验专家,请为以下异常检测项目生成临床意义解释。 + +## 异常项目: +{chr(10).join(items_desc)} + +## 要求: +1. 为每个项目提供英文和中文的临床意义解释 +2. 解释应包含:该指标的作用、异常时可能的原因和健康影响 +3. 语言专业但易于理解 +4. 每个解释约30-60字 + +## 输出格式(JSON): +```json +{{ + "ABB1": {{ + "clinical_en": "English clinical significance explanation...", + "clinical_cn": "中文临床意义解释..." + }}, + "ABB2": {{ + "clinical_en": "...", + "clinical_cn": "..." + }} +}} +``` + +只返回JSON,不要其他说明。使用项目的ABB缩写作为key。""" + + try: + response = call_deepseek_api(prompt, api_key) + + # 解析 JSON + if '```json' in response: + response = response.split('```json')[1].split('```')[0] + elif '```' in response: + response = response.split('```')[1].split('```')[0] + + generated = json.loads(response.strip()) + result.update(generated) + print(f" ✓ DeepSeek生成: {len(generated)} 个项目") + except Exception as e: + print(f" ✗ DeepSeek生成失败: {e}") + elif items_need_generation: + print(f" ⚠️ {len(items_need_generation)} 个项目无模板解释且无API Key") + + print(f" 📊 解释来源统计: 模板 {template_count} 个, DeepSeek {len(result) - template_count} 个") + return result + + +def generate_and_fill_health_content(doc, matched_data: dict, api_key: str, call_deepseek_api): + """根据异常项生成并填充健康评估和建议内容 + + V2版本: + - 整体健康情况分析:功能医学整体观视角,包含优势指标和异常指标,四大系统分析 + - 医学干预建议:BHRT + IVNT + MSC 三大板块,紧扣前期问题给出解决方案 + """ + if not api_key: + print(" 未提供DeepSeek API Key,跳过健康内容生成") + return + + print(" 正在收集异常项...") + abnormal_items = collect_all_abnormal_items(matched_data, api_key) + if not abnormal_items: + print(" 没有检测到异常项目,跳过内容生成") + return + print(f" 发现 {len(abnormal_items)} 个异常项目") + + # 0. 先生成临床意义解释 + print(" 正在调用DeepSeek生成临床意义解释...") + item_explanations = generate_item_explanations(abnormal_items, api_key, call_deepseek_api) + + # 1. 使用V2版本生成整体健康情况分析 + print(" 正在调用DeepSeek生成整体健康情况分析(V2)...") + from health_assessment_v2 import generate_health_assessment_v2, fill_health_assessment_v2 + assessment_result = generate_health_assessment_v2(matched_data, api_key, call_deepseek_api) + + # 2. 使用V2版本生成医学干预建议(替代原有的功能医学健康建议) + print(" 正在调用DeepSeek生成医学干预建议(V2)...") + from medical_intervention_v2 import generate_medical_intervention_v2, fill_medical_intervention_v2 + + # 收集所有检测项目用于性别检测 + all_items = [] + for module_name, module_data in matched_data.items(): + if isinstance(module_data, dict) and 'items' in module_data: + all_items.extend(module_data['items']) + + intervention_result = generate_medical_intervention_v2(abnormal_items, api_key, call_deepseek_api, all_items) + + # 3. 生成功能医学健康建议 + print(" 正在调用DeepSeek生成功能医学健康建议...") + advice_result = generate_functional_health_advice(abnormal_items, api_key, call_deepseek_api) + + # 4. 按从后往前的顺序填充(避免位置偏移问题) + # 先填充功能医学健康建议(位置最后) + if advice_result and advice_result.get('sections'): + print(" 正在填充功能医学健康建议...") + fill_functional_health_advice_section(doc, advice_result) + + # 再填充医学干预建议(位置靠后)- 使用V2版本 + if intervention_result: + print(" 正在填充医学干预建议(V2)...") + fill_medical_intervention_v2(doc, intervention_result) + + # 再填充整体健康评估(位置靠前)- 使用V2版本 + if assessment_result: + print(" 正在填充整体健康情况分析(V2)...") + fill_health_assessment_v2(doc, assessment_result) + + # 最后填充异常指标汇总(位置最前) + print(" 正在填充异常指标汇总...") + fill_abnormal_index_summary(doc, abnormal_items, item_explanations) + + print(" 健康内容生成完成") diff --git a/backend/list_all_config_items.py b/backend/list_all_config_items.py new file mode 100644 index 0000000..24d98a5 --- /dev/null +++ b/backend/list_all_config_items.py @@ -0,0 +1,32 @@ +"""列出配置文件中所有模块和项目""" +import json + +with open('abb_mapping_config.json', 'r', encoding='utf-8') as f: + config = json.load(f) + +modules = config.get('modules', {}) +total = 0 + +print("=" * 80) +print("配置文件中的所有检测项目(按模块分类)") +print("=" * 80) + +for name, data in modules.items(): + cn_name = data.get('cn_name', '') + pages = data.get('pages', '') + items = data.get('items', []) + + print(f"\n### {name} ({cn_name}) - 页码: {pages}") + print("-" * 60) + print(f"| {'序号':<4} | {'ABB':<20} | {'项目名称':<15} |") + print("-" * 60) + + for i, item in enumerate(items, 1): + abb = item.get('abb', '') + project_cn = item.get('project_cn', '') + print(f"| {i:<4} | {abb:<20} | {project_cn:<15} |") + total += 1 + +print("\n" + "=" * 80) +print(f"总计: {len(modules)} 个模块, {total} 个检测项目") +print("=" * 80) diff --git a/backend/main.py b/backend/main.py new file mode 100644 index 0000000..b0b8346 --- /dev/null +++ b/backend/main.py @@ -0,0 +1,144 @@ +from fastapi import FastAPI, UploadFile, File, HTTPException, Form +from fastapi.middleware.cors import CORSMiddleware +from fastapi.responses import FileResponse +from typing import List +import os +import tempfile +from pathlib import Path +from dotenv import load_dotenv + +# 加载环境变量 +load_dotenv() + +from services.ocr_service import OCRService +from services.llm_service import LLMService +from services.report_integrator import ReportIntegrator +from services.pdf_service import PDFService +from services.template_service import TemplateService +from services.batch_report_service import BatchReportService +from services.deepseek_health_service import DeepSeekHealthService + +app = FastAPI(title="医疗报告分析系统") + +# CORS配置 +app.add_middleware( + CORSMiddleware, + allow_origins=["http://localhost:5173", "http://localhost:3000"], + allow_credentials=True, + allow_methods=["*"], + allow_headers=["*"], +) + +# 初始化服务(仅用于综合报告流程) +ocr_service = OCRService() +llm_service = LLMService() +report_integrator = ReportIntegrator(llm_service) +pdf_service = PDFService() +template_service = TemplateService() +batch_service = BatchReportService(ocr_service, llm_service, pdf_service, template_service) + +# 检查 DeepSeek 配置状态 +deepseek_key = os.getenv("DEEPSEEK_API_KEY", "") +if deepseek_key: + print("✓ DeepSeek API Key 已配置,健康评估和建议功能已启用") +else: + print("⚠ DeepSeek API Key 未配置,健康评估和建议功能将被跳过") + + +@app.get("/") +async def root(): + return {"message": "医疗报告分析系统API", "status": "running"} + + +@app.post("/api/comprehensive-report") +async def generate_comprehensive_report( + files: List[UploadFile] = File(...), + patient_name: str = Form(default="患者") +): + """ + 批量上传并生成综合健康报告 + 注意:上传的文件不会永久存储,处理完后自动删除 + """ + temp_files = [] + + try: + # 验证文件类型 + allowed_extensions = {".pdf", ".jpg", ".jpeg", ".png", ".bmp"} + + # 临时保存上传的文件 + for file in files: + file_ext = Path(file.filename).suffix.lower() + + if file_ext not in allowed_extensions: + raise HTTPException( + status_code=400, + detail=f"不支持的文件格式: {file.filename}" + ) + + # 创建临时文件 + temp_file = tempfile.NamedTemporaryFile( + delete=False, + suffix=file_ext, + dir=str(batch_service.temp_dir) + ) + + # 写入文件内容 + content = await file.read() + temp_file.write(content) + temp_file.close() + + temp_files.append(temp_file.name) + + # 批量处理报告 + result = batch_service.process_multiple_reports( + file_paths=temp_files, + patient_name=patient_name + ) + + return { + "success": True, + "patient_name": result["patient_name"], + "report_count": result["report_count"], + "pdf_filename": Path(result["pdf_path"]).name, + "pdf_path": result["pdf_path"], + "generated_at": result["generated_at"], + # 包含健康评估和建议内容 + "health_assessment": result.get("analysis", {}).get("health_assessment", {}), + "health_advice": result.get("analysis", {}).get("health_advice", {}) + } + + except HTTPException: + # 清理临时文件 + batch_service._cleanup_temp_files(temp_files) + raise + except Exception as e: + # 清理临时文件 + batch_service._cleanup_temp_files(temp_files) + import traceback + error_detail = f"综合报告生成失败: {str(e)}\n{traceback.format_exc()}" + print(error_detail) # 打印完整错误堆栈 + raise HTTPException(status_code=500, detail=f"综合报告生成失败: {str(e)}") + + +@app.get("/api/comprehensive-report/download/{pdf_filename}") +async def download_comprehensive_report(pdf_filename: str): + """下载综合健康报告""" + try: + pdf_path = pdf_service.output_dir / pdf_filename + + if not pdf_path.exists(): + raise HTTPException(status_code=404, detail="PDF文件不存在") + + return FileResponse( + path=str(pdf_path), + media_type="application/pdf", + filename=pdf_filename + ) + + except Exception as e: + raise HTTPException(status_code=500, detail=f"下载失败: {str(e)}") + + +if __name__ == "__main__": + import uvicorn + uvicorn.run(app, host="0.0.0.0", port=8001) diff --git a/backend/medical_intervention_v2.py b/backend/medical_intervention_v2.py new file mode 100644 index 0000000..7db47e5 --- /dev/null +++ b/backend/medical_intervention_v2.py @@ -0,0 +1,1486 @@ +""" +医学干预建议模块 V2 +基于医疗干预模板(1).docx重新设计的功能医学健康建议生成器 + +结构: +1. 总述(核心干预方向+原则) +2. 生物同源性荷尔蒙调理 (BHRT) +3. 静脉营养点滴组合 (IVNT) +4. 细胞再生疗法 (MSC) + +每个板块按"关联前期问题→方案细节→价值小结"展开 +""" + +import json +import re +from typing import Dict, List, Any, Tuple +from docx.oxml import OxmlElement +from docx.oxml.ns import qn + + +# 三大板块的固定结构 +INTERVENTION_SECTIONS = { + 'BHRT': { + 'title_en': 'Hormone-Centered Improvement Plan', + 'title_cn': '生物同源性荷尔蒙调理', + 'number': '1', + 'subsections': [ + {'id': '1.1', 'title_en': 'Thyroid-axis optimization', 'title_cn': '甲状腺轴优化'}, + {'id': '1.2', 'title_en': 'Sex-steroid / perimenopause support', 'title_cn': '性激素与围绝经管理'}, + ] + }, + 'IVNT': { + 'title_en': 'IVNT Drip Selection', + 'title_cn': '建议静脉营养点滴组合', + 'number': '2', + 'subsections': [ + {'id': 'detox', 'title_en': 'Liver Detox Protocol', 'title_cn': '肝胆排毒'}, + {'id': 'immune', 'title_en': 'Immune Activation Therapy', 'title_cn': '免疫激活疗法'}, + {'id': 'ovarian', 'title_en': 'Ovarian Care Formula', 'title_cn': '卵巢呵护配方'}, + {'id': 'nad', 'title_en': 'Multi-Nutrient Advanced NAD+', 'title_cn': 'NAD+能量支持'}, + ] + }, + 'MSC': { + 'title_en': 'Cellular Regeneration', + 'title_cn': '细胞再生疗法(干细胞)', + 'number': '3', + 'subsections': [ + {'id': '3.1', 'title_en': 'Clinical positioning', 'title_cn': '临床定位'}, + {'id': '3.2', 'title_en': 'Immunomodulatory mechanisms', 'title_cn': '免疫调节机制'}, + {'id': '3.3', 'title_en': 'Ovarian function support', 'title_cn': '协助卵巢功能恢复'}, + {'id': '3.4', 'title_en': 'Synergy with BHRT', 'title_cn': '与BHRT协同'}, + ] + } +} + + +def detect_gender_from_items(all_items: List[Dict]) -> str: + """ + 根据检测项目判断性别 + + 检测逻辑: + - 如果有 TPSA/FPSA(前列腺特异性抗原)→ 男性 + - 如果有 AMH/CA125/CA15-3/SCC(女性特异性肿瘤标志物)→ 女性 + - 如果有 E2(雌二醇)且值 > 50 → 女性(男性E2通常 < 50 pg/mL) + - 默认:女性 + + Returns: + 'male' 或 'female' + """ + male_specific = {'TPSA', 'FPSA', 'PSA'} + female_specific = {'AMH', 'CA125', 'CA15-3', 'SCC'} + + for item in all_items: + abb = item.get('abb', '').upper() + + # 男性特异性检测项目 + if abb in male_specific: + return 'male' + + # 女性特异性检测项目 + if abb in female_specific: + return 'female' + + # E2(雌二醇)判断 + if abb == 'E2': + try: + result = item.get('result', '') + if result: + value = float(str(result).replace('<', '').replace('>', '').strip()) + if value > 50: + return 'female' + except: + pass + + return 'female' # 默认女性 + + +def categorize_abnormal_items(abnormal_items: List[Dict], api_key: str = None, call_deepseek_api = None, all_items: List[Dict] = None) -> Dict[str, List[Dict]]: + """ + 将异常指标按24个模块分类(基于abb_mapping_config.json) + + 每个异常项已经有module字段(来自matched_data),直接使用该字段进行分类。 + 对于没有module字段的项目,使用DeepSeek进行智能分类到24个模块之一。 + + 荷尔蒙项目会根据检测到的性别路由到正确的模块(Female Hormone 或 Male Hormone)。 + + Args: + abnormal_items: 异常指标列表 + api_key: DeepSeek API Key + call_deepseek_api: API调用函数 + all_items: 所有检测项目(用于性别检测) + + Returns: + Dict[str, List[Dict]]: 模块名称 -> 异常项列表 + """ + from config import load_abb_config, normalize_abb, normalize_module_name + + # 加载24个模块配置 + abb_config = load_abb_config() + modules = abb_config.get('modules', {}) + abb_to_module = abb_config.get('abb_to_module', {}) + + # 检测性别 + items_for_detection = all_items if all_items else abnormal_items + detected_gender = detect_gender_from_items(items_for_detection) + print(f" [性别检测] 检测结果: {'男性' if detected_gender == 'male' else '女性'}") + + # 根据性别确定荷尔蒙项目应该分配到的模块 + hormone_target_module = 'Male Hormone' if detected_gender == 'male' else 'Female Hormone' + hormone_wrong_module = 'Female Hormone' if detected_gender == 'male' else 'Male Hormone' + + # 荷尔蒙相关的ABB(这些项目在男性和女性荷尔蒙模块中都可能出现) + hormone_abbs = {'E2', 'PROG', 'FSH', 'LH', 'PRL', 'T', 'DHEAS', 'COR', 'CORTISOL', 'IGF-1', 'AMH'} + + # 初始化所有模块的分类字典 + categories = {module_name: [] for module_name in modules.keys()} + unclassified = [] # 无法自动分类的指标 + + for item in abnormal_items: + abb = item.get('abb', '') + module = item.get('module', '') + abb_upper = abb.upper() + + # 特殊处理:荷尔蒙项目根据性别路由 + if abb_upper in hormone_abbs: + item['module'] = hormone_target_module + categories[hormone_target_module].append(item) + continue + + # 1. 首先检查item自带的module字段 + if module: + # 标准化模块名称 + normalized_module = normalize_module_name(module, abb_config) + # 如果是错误的荷尔蒙模块,修正为正确的 + if normalized_module == hormone_wrong_module: + normalized_module = hormone_target_module + if normalized_module in categories: + categories[normalized_module].append(item) + continue + + # 2. 尝试通过ABB查找模块 + normalized_abb = normalize_abb(abb, abb_config) + found_module = abb_to_module.get(normalized_abb.upper()) or abb_to_module.get(abb.upper()) + + # 如果是错误的荷尔蒙模块,修正为正确的 + if found_module == hormone_wrong_module: + found_module = hormone_target_module + + if found_module and found_module in categories: + item['module'] = found_module # 补充module字段 + categories[found_module].append(item) + continue + + # 3. 无法自动分类的收集起来 + unclassified.append(item) + + # 如果有无法分类的指标,使用DeepSeek进行分类 + if unclassified and api_key and call_deepseek_api: + print(f" 🤖 {len(unclassified)} 个指标需要DeepSeek智能分类...") + deepseek_classified = classify_with_deepseek(unclassified, list(modules.keys()), api_key, call_deepseek_api) + + # 合并分类结果 + for module_name, items in deepseek_classified.items(): + if module_name in categories: + categories[module_name].extend(items) + elif unclassified: + # 没有API时,打印警告但不默认归入任何类别 + print(f" ⚠️ {len(unclassified)} 个指标无法分类(无API Key):") + for item in unclassified: + print(f" - {item.get('name', '')} ({item.get('abb', '')})") + + # 移除空的分类 + categories = {k: v for k, v in categories.items() if v} + + return categories + + +def classify_with_deepseek(unclassified_items: List[Dict], module_names: List[str], api_key: str, call_deepseek_api) -> Dict[str, List[Dict]]: + """ + 使用DeepSeek对无法自动分类的指标进行分类到24个模块之一 + + Args: + unclassified_items: 无法自动分类的指标列表 + module_names: 24个模块名称列表 + api_key: DeepSeek API Key + call_deepseek_api: API调用函数 + + Returns: + 分类结果字典 {module_name: [items]} + """ + from config import load_abb_config + + if not unclassified_items or not api_key: + return {} + + # 加载配置获取模块中文名称 + abb_config = load_abb_config() + modules = abb_config.get('modules', {}) + + # 构建模块列表描述 + module_desc = [] + for module_name in module_names: + cn_name = modules.get(module_name, {}).get('cn_name', '') + module_desc.append(f"- {module_name} ({cn_name})") + + # 构建指标描述 + items_desc = [] + for item in unclassified_items: + direction = '偏高' if item.get('point') in ['↑', 'H', '高'] else '偏低' + desc = f"- {item.get('name', '')} ({item.get('abb', '')}): {item.get('result', '')} {item.get('unit', '')} ({direction})" + items_desc.append(desc) + + prompt = f"""请将以下医学检测指标分类到对应的模块中。 + +## 待分类指标: +{chr(10).join(items_desc)} + +## 可选模块(共24个): +{chr(10).join(module_desc)} + +## 分类规则: +1. 每个指标必须分类到上述24个模块之一 +2. 根据指标的医学属性选择最合适的模块 +3. 不要创建新的模块,只能使用上述24个模块 + +## 输出格式(JSON): +```json +{{ + "classifications": [ + {{"abb": "指标缩写", "module": "模块英文名称"}} + ] +}} +``` + +只返回JSON,不要其他内容。""" + + try: + response = call_deepseek_api(prompt, api_key, max_tokens=1000, timeout=30) + + if response is None: + return {} + + # 解析JSON + if '```json' in response: + response = response.split('```json')[1].split('```')[0] + elif '```' in response: + response = response.split('```')[1].split('```')[0] + + result = json.loads(response.strip()) + classifications = result.get('classifications', []) + + # 构建分类结果 + classified = {} + + # 创建ABB到item的映射 + abb_to_item = {item.get('abb', '').upper(): item for item in unclassified_items} + + for cls in classifications: + abb = cls.get('abb', '').upper() + module = cls.get('module', '') + + if abb in abb_to_item and module in module_names: + if module not in classified: + classified[module] = [] + item = abb_to_item[abb] + item['module'] = module # 补充module字段 + classified[module].append(item) + print(f" ✓ {item.get('name', '')} ({abb}) -> {module}") + + return classified + + except Exception as e: + print(f" ⚠️ DeepSeek分类失败: {e}") + return {} + + +def build_problem_summary(categories: Dict[str, List[Dict]]) -> str: + """ + 构建问题摘要,用于prompt + 基于24个模块分类构建摘要 + """ + from config import load_abb_config + + # 加载配置获取模块中文名称 + abb_config = load_abb_config() + modules = abb_config.get('modules', {}) + + summary_parts = [] + + for module_name, items in categories.items(): + if not items: + continue + + # 获取模块中文名称 + cn_name = modules.get(module_name, {}).get('cn_name', module_name) + + # 构建该模块的描述 + desc = f"【{cn_name}】({module_name}):" + "、".join([ + f"{item['name']}({item['abb']}) {item['result']}{item.get('unit', '')} {'↑' if item['point'] in ['↑', 'H', '高'] else '↓'}" + for item in items[:5] + ]) + if len(items) > 5: + desc += f" 等{len(items)}项" + summary_parts.append(desc) + + return "\n".join(summary_parts) if summary_parts else "暂无明显异常" + + +def build_intervention_prompt(abnormal_items: List[Dict], categories: Dict[str, List[Dict]]) -> str: + """ + 构建医学干预建议的Prompt(V2版本 - 基于案例文档优化) + """ + problem_summary = build_problem_summary(categories) + + # 构建详细的异常指标列表 + abnormal_details = [] + for item in abnormal_items: + direction = '偏高' if item['point'] in ['↑', 'H', '高'] else '偏低' + detail = f"- {item['name']} ({item['abb']}): {item['result']} {item.get('unit', '')} ({direction})" + if item.get('reference'): + detail += f" [参考: {item['reference']}]" + abnormal_details.append(detail) + + prompt = f"""# 角色设定 +你是Be.U Med功能医学团队的资深医学顾问,在功能医学、抗衰老医学、血液净化疗法、静脉营养疗法(IVNT)、生物同源性荷尔蒙疗法(BHRT)、干细胞再生医学领域具有丰富的临床经验。 + +# 任务 +根据体检者的异常指标,撰写个性化的「医学干预」建议方案。 + +# 体检者问题摘要 +{problem_summary} + +# 异常指标详情 +{chr(10).join(abnormal_details)} + +# 核心原则(必须严格遵守) + +## 1. 段落格式(极其重要!严格遵守字数限制!) +- **每个子板块必须包含2-4个独立的内容段落,不要合并成一整段** +- **每个内容段落必须先写英文,再写对应的中文** +- **英文段落:50-80词,绝对不能超过80词** +- **中文段落:70-120字,绝对不能超过120字(硬性上限140字)** +- 不要英中混排,必须分开 +- **禁止将所有内容写成一个超长段落** +- **如果内容较多,必须拆分成多个段落,每个段落控制在限制内** + +## 2. 逻辑要求 +- 不重复指标解读(前期报告分析已完成) +- 紧扣前期问题给出对应解决方案 +- 清晰传递产品核心价值与协同作用 +- 体现"先稳基础、再精准干预"的功能医学思路 + +## 3. 语气要求 +- 客观陈述方案原理与实际价值 +- 使用"建议""支持""助力""帮助""有助于"等引导词 +- **严禁使用任何不确定表述**,包括但不限于: + - 中文禁用词:可能、也许、或许、大概、似乎、有概率、有可能、可能会、或可、疑似、倾向于、趋向于、不排除、有待、存在...的可能 + - 英文禁用词:may, might, could, possibly, probably, perhaps, likely, potentially, tend to, appear to, seem to, it is possible that +- 禁用强硬/绝对化表述(如"必须""一定""保证""治愈") +- 不夸大效果,实事求是 + +## 4. 模块总结要求(极其重要!必须遵守!) +- **每个区域模块必须包含intro字段,放在区域标题后、子标题前** +- **intro字段是JSON中的必填字段,格式为:{{"en": "英文", "cn": "中文"}}** +- 总结段落约120-140字(中文),对应英文约80-100词 +- 总结内容:概括该模块要解决的核心问题、干预方向、预期价值 +- 格式:先英文总结,再中文总结 +- 位置:区域标题 → intro模块总结 → 子标题1 → 子标题2... + +# 文章结构(必须严格遵循案例格式) + +## 总述部分(固定模板+动态填充) +- **标题**: `Medical Intervention 「医学干预」建议方案` + +**英文总述(固定模板)**: +"The current core medical intervention priorities are: `{{动态:干预方向,如 Vascular health (dyslipidemia + elevated lipoprotein(a)) → Thyroid autoimmune regulation → Iron metabolism balance → Hormonal homeostasis}}`." + +"Based on this, the principle guiding your proposed medical intervention plan is to first \"reduce vascular risk and oxidative stress\", followed by \"immune regulation and hormonal optimization\". Your issue does not lie with acute organ dysfunction, but rather with \"`{{动态:具体异常组合,如 elevated lipoprotein(a) + thyroid autoimmune activation + iron overload + mild androgen imbalance}}`\"—a combination of chronic risk factors that synergistically threaten vascular and endocrine health. In functional medicine, this pattern requires a multi-targeted, integrated intervention to address both symptoms and root causes." + +**中文总述(固定模板)**: +"当前核心的医疗干预方向为:`{{动态:干预方向,如 血管健康 → 荷尔蒙平衡 → 微量元素调节 → 机体抗衰}}`。基于此,针对您的「医学干预」建议方案的原则是先"降低血管风险与氧化应激",再"免疫调节与荷尔蒙优化"。`{{动态:问题定性,如 您的问题并非急性脏器功能障碍,而是"脂蛋白(a)升高 + 甲状腺自身免疫活化 + 铁过载 + 轻度雄激素失衡"的慢性风险组合,这些因素协同威胁血管与内分泌健康}}`。在功能医学上,此类格局需要多靶点、一体化干预,实现标本兼顾。" + +**动态部分生成规则**: +- 干预方向:根据异常指标所属的系统生成,格式为 `A → B → C → D` +- 具体异常组合:列出主要异常指标的英文描述 +- 问题定性:用中文描述异常指标的组合特征 + +## 板块选择(根据异常指标动态选择2-4个板块) + +### 板块A: Vascular Protection & Metabolic Regulation 血管保护与代谢调控 +适用于:血脂异常、脂蛋白(a)升高、胆固醇异常、同型半胱氨酸升高 + +A.1 Blood Purification Therapy (DFPP + Ozone) 血液净化疗法 +- 疗法定位(英文→中文) +- DFPP深度血液净化:每6个月1次,共2次,每次120分钟(英文→中文) +- Ozone轻盈血液净化:每月1次,共6次,每次30分钟(英文→中文) +- 干预意义(英文→中文) + +A.2 IVNT Nutritional Intervention 静脉营养点滴干预 +- IVNT定位说明(英文→中文) +- IVNT – 肝胆排毒 (3次/疗程):维生素C、谷胱甘肽、α-硫辛酸(英文→中文) +- IVNT – 免疫激活疗法 (3次/疗程):B族维生素、辅酶Q10、NAC(英文→中文) +- IVNT – 血管保护配方 (3次/疗程):叶酸、B6、B12、Omega-3(英文→中文) +- 干预意义(英文→中文) + +### 板块B: Immune Regulation & Thyroid Protection 免疫调节与甲状腺保护 +适用于:甲状腺抗体升高(TgAb/TPOAb)、自身免疫活化、甲状腺功能异常 + +B.1 Lymphocyte Therapy 淋巴细胞疗法 +- 疗法定位:血液净化与IVNT干预1个月后启动,每年1次(英文→中文) +- 免疫调节机制:Treg细胞作用(英文→中文) +- 干预意义(英文→中文) + +B.2 BHRT – Bioidentical Hormone Therapy 生物同源性荷尔蒙调理 +- 方案定位:靶向调节而非外源替代(英文→中文) +- 甲状腺轴优化:硒200μg/日、维生素D3 5000IU/日(英文→中文) +- 性激素轴调节(英文→中文) +- 干预意义(英文→中文) + +### 板块C: Iron Metabolism Balance & Lifestyle Optimization 铁代谢平衡与生活方式优化 +适用于:铁蛋白异常、铁过载、需要生活方式干预 + +C.1 ONS (Oral Nutritional Support) 口服营养支持 +- 方案定位:定制6个月(英文→中文) +- 铁代谢调节:乳铁蛋白拮抗剂+维生素E 400IU/日(英文→中文) +- 血管保护辅助:辅酶Q10 200mg/日+植物甾醇2g/日(英文→中文) + +C.2 Lifestyle Optimization 生活方式优化 +- 饮食建议:具体食物推荐和禁忌(英文→中文) +- 运动建议:每周3次有氧+每周2次阻力训练(英文→中文) +- 作息与压力管理:7-8小时睡眠+15分钟冥想(英文→中文) +- 干预意义(英文→中文) + +### 板块D: Cellular Regeneration 细胞再生疗法(干细胞) +适用于:多系统慢性问题、作为其他干预的协同增效 + +D.1 Clinical Positioning 临床定位 +- MSC疗法定位:再生修复+免疫调节,核心干预3个月后启动,每6个月1次,共2次(英文→中文) +- MSC作用机制(英文→中文) + +D.2 Synergy with Other Interventions 与其他干预的协同 +- 血管保护协同(英文→中文) +- 甲状腺支持协同(英文→中文) +- 荷尔蒙优化协同(英文→中文) + +## 签名部分(右对齐,空一行后显示) +Functional Medical Team from Be.U Med +Be.U Med 功能医学团队 +[当前日期,格式:YYYY年MM月DD日] + +# 输出格式(JSON) + +```json +{{ + "overview": {{ + "title_en": "Medical Intervention", + "title_cn": "「医学干预」建议方案", + "content_en": "使用固定模板填充动态部分。第一句:The current core medical intervention priorities are: [动态:干预方向]。第二句:Based on this, the principle guiding your proposed medical intervention plan is to first \"reduce vascular risk and oxidative stress\", followed by \"immune regulation and hormonal optimization\". Your issue does not lie with acute organ dysfunction, but rather with \"[动态:具体异常组合]\"—a combination of chronic risk factors that synergistically threaten vascular and endocrine health. In functional medicine, this pattern requires a multi-targeted, integrated intervention to address both symptoms and root causes.", + "content_cn": "使用固定模板填充动态部分。当前核心的医疗干预方向为:[动态:干预方向]。基于此,针对您的「医学干预」建议方案的原则是先"降低血管风险与氧化应激",再"免疫调节与荷尔蒙优化"。[动态:问题定性]。在功能医学上,此类格局需要多靶点、一体化干预,实现标本兼顾。" + }}, + "sections": [ + {{ + "number": "1", + "title_en": "Vascular Protection & Metabolic Regulation", + "title_cn": "血管保护与代谢调控", + "intro": {{ + "en": "模块总结英文(80-100词):概括该模块要解决的核心问题、干预方向、预期价值...", + "cn": "模块总结中文(120-140字):概括该模块要解决的核心问题、干预方向、预期价值..." + }}, + "subsections": [ + {{ + "id": "1.1", + "title_en": "Blood Purification Therapy (DFPP + Ozone)", + "title_cn": "血液净化疗法(DFPP + Ozone)", + "paragraphs": [ + {{ + "en": "英文段落1:疗法定位和概述(50-80词)...", + "cn": "中文段落1:对应翻译(70-120字,不超过140字)..." + }}, + {{ + "en": "英文段落2:DFPP疗法详情,包含频次、疗程、时长、作用机制...", + "cn": "中文段落2:对应翻译..." + }}, + {{ + "en": "英文段落3:Ozone疗法详情...", + "cn": "中文段落3:对应翻译..." + }}, + {{ + "en": "Intervention Significance: 干预意义英文...", + "cn": "干预意义: 中文..." + }} + ] + }}, + {{ + "id": "1.2", + "title_en": "IVNT Nutritional Intervention", + "title_cn": "静脉营养点滴干预", + "paragraphs": [...] + }} + ] + }}, + {{ + "number": "2", + "title_en": "Immune Regulation & Thyroid Protection", + "title_cn": "免疫调节与甲状腺保护", + "intro": {{ + "en": "模块总结英文...", + "cn": "模块总结中文..." + }}, + "subsections": [...] + }} + ], + "signature": {{ + "team_en": "Functional Medical Team from Be.U Med", + "team_cn": "Be.U Med 功能医学团队", + "date": "YYYY年MM月DD日(使用当前实际日期)" + }} +}} +``` + +# 重要提示 +1. **每个子板块必须包含2-4个独立段落,禁止合并成一整段** +2. **中文段落严格控制在70-120字,绝对不能超过140字** +3. **英文段落严格控制在50-80词,绝对不能超过80词** +4. **内容必须与体检者的具体异常指标关联** +5. **根据异常指标类型选择合适的板块**(不是所有板块都需要,选择2-4个最相关的) +6. **每个疗法都要说明具体的频次、疗程、核心成分** +7. **每个内容点先英文后中文,不要混排** +8. **强调各干预之间的协同作用** +9. **语气始终保持专业、客观、温和** +10. **【极其重要】每个sections数组中的区域模块必须包含intro字段!这是必填字段!** +11. **【极其重要】intro字段格式:{{"en": "英文总结80-100词", "cn": "中文总结120-140字"}}** +12. **【极其重要】intro内容:概括该模块要解决的核心问题、干预方向、预期价值** +13. **只返回JSON,不要其他内容**""" + + return prompt + + + + +def generate_medical_intervention_v2(abnormal_items: List[Dict], api_key: str, call_deepseek_api, all_items: List[Dict] = None) -> dict: + """ + 生成医学干预建议内容(V2版本) + + Args: + abnormal_items: 异常项列表 + api_key: DeepSeek API Key + call_deepseek_api: API调用函数 + all_items: 所有检测项目(用于性别检测) + + Returns: + 医学干预建议内容字典 + """ + if not api_key: + print(" ⚠️ 未提供API Key,跳过医学干预建议生成") + return {} + + if not abnormal_items: + print(" ⚠️ 没有异常指标,跳过医学干预建议生成") + return {} + + # 分类异常指标(传入API参数以支持DeepSeek智能分类,传入all_items用于性别检测) + categories = categorize_abnormal_items(abnormal_items, api_key, call_deepseek_api, all_items) + + # 打印分类统计 + print(f" 📊 问题分类统计(共 {len(categories)} 个模块有异常):") + for module_name, items in categories.items(): + if items: + print(f" {module_name}: {len(items)} 项") + + # 构建prompt + prompt = build_intervention_prompt(abnormal_items, categories) + + def parse_json_response(response_text): + """解析JSON响应""" + if '```json' in response_text: + response_text = response_text.split('```json')[1].split('```')[0] + elif '```' in response_text: + response_text = response_text.split('```')[1].split('```')[0] + + response_text = response_text.strip() + + try: + return json.loads(response_text) + except json.JSONDecodeError: + pass + + # 尝试修复 + if response_text.count('"') % 2 != 0: + response_text += '"' + + open_braces = response_text.count('{') - response_text.count('}') + open_brackets = response_text.count('[') - response_text.count(']') + + if open_brackets > 0: + if open_braces > 0: + response_text += '}' * open_braces + response_text += ']' * open_brackets + elif open_braces > 0: + response_text += '}' * open_braces + + try: + return json.loads(response_text) + except json.JSONDecodeError: + return None + + # 最多重试3次 + for attempt in range(3): + try: + print(f" 🤖 调用DeepSeek生成医学干预建议... (第{attempt+1}次)") + response = call_deepseek_api(prompt, api_key, max_tokens=8000, timeout=240) + + if response is None: + if attempt < 2: + print(f" ⚠️ API请求失败,重试中...") + import time + time.sleep(3) + continue + + result = parse_json_response(response) + + if result and (result.get('overview') or result.get('sections') or result.get('bhrt') or result.get('ivnt') or result.get('msc')): + # 检查并补充缺失的intro字段 + if result.get('sections'): + for section in result['sections']: + if not section.get('intro'): + title_en = section.get('title_en', '') + title_cn = section.get('title_cn', '') + print(f" ⚠️ 区域 '{title_en}' 缺少intro,将自动生成") + # 生成默认intro + section['intro'] = { + 'en': f"This section focuses on {title_en.lower()}, addressing the specific abnormalities identified in your test results and providing targeted intervention strategies.", + 'cn': f"本模块聚焦于{title_cn},针对您检测结果中发现的具体异常指标,提供靶向干预策略。" + } + + print(f" ✓ 成功生成医学干预建议") + return result + + if attempt < 2: + print(f" ⚠️ 响应格式不完整,重试中...") + + except Exception as e: + if attempt < 2: + print(f" ⚠️ 生成失败: {e},重试中...") + + print(f" ✗ 生成医学干预建议失败") + return {} + + +def convert_intervention_to_sections(intervention_result: dict) -> dict: + """ + 将V2格式转换为sections格式,以便复用现有的填充函数 + 支持两种JSON格式: + 1. 新格式(带sections数组) + 2. 旧格式(bhrt/ivnt/msc分开) + """ + sections = [] + + # 1. 总述部分 + overview = intervention_result.get('overview', {}) + if overview: + paragraphs = [] + + # 新格式:content_en/content_cn + if overview.get('content_en') or overview.get('content_cn'): + paragraphs.append({ + 'en': overview.get('content_en', ''), + 'cn': overview.get('content_cn', '') + }) + else: + # 旧格式:direction + principle + if overview.get('direction_en') or overview.get('direction_cn'): + paragraphs.append({ + 'en': overview.get('direction_en', ''), + 'cn': overview.get('direction_cn', '') + }) + if overview.get('principle_en') or overview.get('principle_cn'): + paragraphs.append({ + 'en': overview.get('principle_en', ''), + 'cn': overview.get('principle_cn', '') + }) + + if paragraphs: + sections.append({ + 'title_en': overview.get('title_en', 'Medical Intervention'), + 'title_cn': overview.get('title_cn', '「医学干预」建议方案'), + 'paragraphs': paragraphs, + 'is_main_title': True + }) + + # 2. 检查是否使用新格式(sections数组) + if intervention_result.get('sections'): + for section in intervention_result['sections']: + section_paragraphs = [] + + # 处理模块总结(intro)- 在子标题之前 + intro = section.get('intro', {}) + if intro and (intro.get('en') or intro.get('cn')): + section_paragraphs.append({ + 'en': intro.get('en', ''), + 'cn': intro.get('cn', ''), + 'is_intro': True + }) + else: + # 如果没有intro,打印警告 + title = section.get('title_en', section.get('title_cn', '')) + print(f" ⚠️ 区域 '{title}' 缺少intro字段") + + # 处理子板块 + for subsection in section.get('subsections', []): + # 子标题 - 英文带标号,中文不带标号 + # 格式:1.2 Sex-steroid / early perimenopause support | 性激素与"早衰/围绝经提前"管理 + sub_id = subsection.get('id', '') + sub_title_en = f"{sub_id} {subsection.get('title_en', '')}".strip() + sub_title_cn = subsection.get('title_cn', '') # 中文不加标号 + section_paragraphs.append({ + 'en': sub_title_en, + 'cn': sub_title_cn, + 'is_subtitle': True + }) + + # 处理段落(新格式使用paragraphs数组) + for para in subsection.get('paragraphs', []): + if para.get('en') or para.get('cn'): + section_paragraphs.append({ + 'en': para.get('en', ''), + 'cn': para.get('cn', '') + }) + + # 兼容旧格式(content_en/content_cn) + if subsection.get('content_en') or subsection.get('content_cn'): + section_paragraphs.append({ + 'en': subsection.get('content_en', ''), + 'cn': subsection.get('content_cn', '') + }) + + if section_paragraphs: + number = section.get('number', '') + title_en = f"{number}) {section.get('title_en', '')}".strip() if number else section.get('title_en', '') + title_cn = section.get('title_cn', '') + sections.append({ + 'title_en': title_en, + 'title_cn': title_cn, + 'paragraphs': section_paragraphs + }) + else: + # 3. 旧格式:BHRT板块 + bhrt = intervention_result.get('bhrt', {}) + if bhrt: + paragraphs = [] + + if bhrt.get('intro_en') or bhrt.get('intro_cn'): + paragraphs.append({ + 'en': bhrt.get('intro_en', ''), + 'cn': bhrt.get('intro_cn', '') + }) + + for sub in bhrt.get('subsections', []): + sub_title_en = f"{sub.get('id', '')} {sub.get('title_en', '')}" + sub_title_cn = f"{sub.get('id', '')} {sub.get('title_cn', '')}" + paragraphs.append({ + 'en': sub_title_en, + 'cn': sub_title_cn, + 'is_subtitle': True + }) + if sub.get('content_en') or sub.get('content_cn'): + paragraphs.append({ + 'en': sub.get('content_en', ''), + 'cn': sub.get('content_cn', '') + }) + + if bhrt.get('value_summary_en') or bhrt.get('value_summary_cn'): + paragraphs.append({ + 'en': bhrt.get('value_summary_en', ''), + 'cn': bhrt.get('value_summary_cn', '') + }) + + if paragraphs: + sections.append({ + 'title_en': bhrt.get('title_en', '1) Hormone-Centered Improvement Plan'), + 'title_cn': bhrt.get('title_cn', '生物同源性荷尔蒙调理'), + 'paragraphs': paragraphs + }) + + # 4. 旧格式:IVNT板块 + ivnt = intervention_result.get('ivnt', {}) + if ivnt: + paragraphs = [] + + if ivnt.get('intro_en') or ivnt.get('intro_cn'): + paragraphs.append({ + 'en': ivnt.get('intro_en', ''), + 'cn': ivnt.get('intro_cn', '') + }) + + for formula in ivnt.get('formulas', []): + freq = formula.get('frequency', '') + title_en = f"IVNT – {formula.get('name_en', '')}" + title_cn = f"IVNT – {formula.get('name_cn', '')} {freq}" + paragraphs.append({ + 'en': title_en, + 'cn': title_cn, + 'is_formula_title': True + }) + if formula.get('content_en') or formula.get('content_cn'): + paragraphs.append({ + 'en': formula.get('content_en', ''), + 'cn': formula.get('content_cn', '') + }) + + if ivnt.get('value_summary_en') or ivnt.get('value_summary_cn'): + paragraphs.append({ + 'en': ivnt.get('value_summary_en', ''), + 'cn': ivnt.get('value_summary_cn', '') + }) + + if paragraphs: + sections.append({ + 'title_en': ivnt.get('title_en', '2) IVNT Drip Selection'), + 'title_cn': ivnt.get('title_cn', '建议静脉营养点滴组合'), + 'paragraphs': paragraphs + }) + + # 5. 旧格式:MSC板块 + msc = intervention_result.get('msc', {}) + if msc: + paragraphs = [] + + for sub in msc.get('subsections', []): + sub_title_en = f"{sub.get('id', '')} {sub.get('title_en', '')}" + sub_title_cn = f"{sub.get('id', '')} {sub.get('title_cn', '')}" + paragraphs.append({ + 'en': sub_title_en, + 'cn': sub_title_cn, + 'is_subtitle': True + }) + if sub.get('content_en') or sub.get('content_cn'): + paragraphs.append({ + 'en': sub.get('content_en', ''), + 'cn': sub.get('content_cn', '') + }) + + if msc.get('value_summary_en') or msc.get('value_summary_cn'): + paragraphs.append({ + 'en': msc.get('value_summary_en', ''), + 'cn': msc.get('value_summary_cn', '') + }) + + if paragraphs: + sections.append({ + 'title_en': msc.get('title_en', '3) Cellular Regeneration'), + 'title_cn': msc.get('title_cn', '细胞再生疗法(干细胞)'), + 'paragraphs': paragraphs + }) + + # 6. 签名(右对齐,包含日期)- 始终添加签名,即使API没有返回 + from datetime import datetime + signature = intervention_result.get('signature', {}) + + # 获取日期,优先使用API返回的日期,否则使用当前日期 + date_str = signature.get('date', '') if signature else '' + if not date_str or 'YYYY' in date_str: + date_str = datetime.now().strftime('%Y年%m月%d日') + + # 获取团队名称,使用默认值 + team_en = signature.get('team_en', 'Functional Medical Team from Be.U Med') if signature else 'Functional Medical Team from Be.U Med' + team_cn = signature.get('team_cn', 'Be.U Med 功能医学团队') if signature else 'Be.U Med 功能医学团队' + + sections.append({ + 'title_en': '', + 'title_cn': '', + 'paragraphs': [{ + 'en': team_en, + 'cn': team_cn, + 'date': date_str, + 'is_signature': True + }], + 'is_signature_section': True + }) + + return {'sections': sections} + + +# ============================================================ +# 文档填充函数 +# ============================================================ + +def clean_markdown_formatting(text: str) -> str: + """清理Markdown格式""" + if not text: + return text + text = re.sub(r'\*\*([^*]+)\*\*', r'\1', text) + text = re.sub(r'__([^_]+)__', r'\1', text) + text = re.sub(r'(?= 0: + # Medical Intervention 已存在,替换其内容(到 FHA 或 Client File 为止) + next_section_pos = len(children) + for i in range(mi_pos + 1, len(children)): + text = ''.join(children[i].itertext()).strip().lower() + if any(kw in text for kw in end_keywords): + next_section_pos = i + print(f" 找到下一区域位置: {i}") + break + + # 删除 MI 到下一区域之间的所有内容 + children = list(body) + elements_to_remove = [] + + # 检查MI前面的分页符 + for check_pos in range(max(0, mi_pos - 3), mi_pos): + if check_pos < len(children): + prev_elem = children[check_pos] + br_elem = prev_elem.find('.//{http://schemas.openxmlformats.org/wordprocessingml/2006/main}br') + if br_elem is not None: + break_type = br_elem.get('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}type') + if break_type == 'page': + elements_to_remove.append(prev_elem) + print(f" 删除Medical Intervention前的分页符 (位置 {check_pos})") + + for i in range(mi_pos, min(next_section_pos, len(children))): + elem = children[i] + if elem.tag.endswith('}sectPr'): + continue + elements_to_remove.append(elem) + + for elem in elements_to_remove: + try: + body.remove(elem) + except: + pass + + if elements_to_remove: + print(f" 已删除 {len(elements_to_remove)} 个原有内容") + + # 重新获取插入位置(在下一区域之前) + children = list(body) + insert_pos = 0 + for i, elem in enumerate(children): + text = ''.join(elem.itertext()).strip().lower() + if any(kw in text for kw in end_keywords): + insert_pos = i + break + else: + # Medical Intervention 不存在,在 FHA 或 Client File 之前插入(不删除任何内容) + insert_pos = -1 + for i, elem in enumerate(children): + text = ''.join(elem.itertext()).strip().lower() + if any(kw in text for kw in end_keywords): + insert_pos = i + print(f" Medical Intervention不存在,在位置 {i} 前插入") + break + + if insert_pos < 0: + print(" 未找到插入位置") + return + + # 插入新内容 + section_idx = 0 + for section in sections: + title_en = section.get('title_en', '') + title_cn = section.get('title_cn', '') + paragraphs = section.get('paragraphs', []) + is_main = section.get('is_main_title', False) + is_signature = section.get('is_signature_section', False) + + # 跳过签名部分的标题 + if not is_signature and (title_en or title_cn): + # 只在第二个及之后的板块标题前插入空段落(第一个板块紧跟前面内容,不需要空段落) + if section_idx > 0: + empty_p = create_empty_paragraph_intervention() + body.insert(insert_pos, empty_p) + insert_pos += 1 + + # 板块标题 + title_paragraphs = create_section_title_intervention(title_en, title_cn, is_main) + for title_p in title_paragraphs: + body.insert(insert_pos, title_p) + insert_pos += 1 + + section_idx += 1 + + # 段落内容 + for para_idx, para in enumerate(paragraphs): + is_subtitle = para.get('is_subtitle', False) + is_formula = para.get('is_formula_title', False) + is_sig = para.get('is_signature', False) + is_intro = para.get('is_intro', False) + + en_text = para.get('en', '') + cn_text = para.get('cn', '') + + if is_intro: + # 模块总结样式 - 在区域标题后、子标题前,先英文后中文 + if en_text: + p_en = create_formatted_paragraph_intervention(en_text, is_chinese=False) + body.insert(insert_pos, p_en) + insert_pos += 1 + if cn_text: + p_cn = create_formatted_paragraph_intervention(cn_text, is_chinese=True) + body.insert(insert_pos, p_cn) + insert_pos += 1 + elif is_subtitle or is_formula: + # 子标题样式 - 上面空一行,英文和中文在同一行,加粗 + empty_p = create_empty_paragraph_intervention() + body.insert(insert_pos, empty_p) + insert_pos += 1 + + # 使用新的子标题函数,英文和中文在同一行 + p_subtitle = create_subtitle_intervention(en_text, cn_text) + body.insert(insert_pos, p_subtitle) + insert_pos += 1 + elif is_sig: + # 签名样式 - 右对齐,空一行后显示 + empty_p = create_empty_paragraph_intervention() + body.insert(insert_pos, empty_p) + insert_pos += 1 + + # 获取日期 + date_text = para.get('date', '') + + if en_text: + p_en = create_signature_paragraph(en_text, is_chinese=False) + body.insert(insert_pos, p_en) + insert_pos += 1 + if cn_text: + p_cn = create_signature_paragraph(cn_text, is_chinese=True) + body.insert(insert_pos, p_cn) + insert_pos += 1 + if date_text: + p_date = create_signature_paragraph(date_text, is_chinese=True) + body.insert(insert_pos, p_date) + insert_pos += 1 + else: + # 普通段落 - 不需要额外缩进 + if en_text: + p_en = create_formatted_paragraph_intervention(en_text, is_chinese=False) + body.insert(insert_pos, p_en) + insert_pos += 1 + if cn_text: + p_cn = create_formatted_paragraph_intervention(cn_text, is_chinese=True) + body.insert(insert_pos, p_cn) + insert_pos += 1 + + print(f" ✓ 已插入 {len(sections)} 个医学干预板块") + + # 确保"功能医学健康建议"前面有分页符(MI内容插入在FHA前,需要在MI和FHA之间加分页符) + children = list(body) + for i, elem in enumerate(children): + text = ''.join(elem.itertext()).strip() + text_lower = text.lower() + if ('functional medical health advice' in text_lower or + '功能医学健康建议' in text): + already_has_break = False + if i > 0: + prev_elem = children[i - 1] + br_elem = prev_elem.find('.//{http://schemas.openxmlformats.org/wordprocessingml/2006/main}br') + if br_elem is not None: + break_type = br_elem.get('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}type') + if break_type == 'page': + already_has_break = True + + if not already_has_break: + page_break = create_page_break_paragraph() + body.insert(i, page_break) + print(f" 已在'功能医学健康建议'前插入分页符") + break + + # 确保"客户功能医学检测档案"前面有分页符 + children = list(body) + for i, elem in enumerate(children): + text = ''.join(elem.itertext()).strip() + if '功能医学检测档案' in text or 'Functional Medical Examination File' in text: + # 检查前一个元素是否已经是分页符 + already_has_break = False + if i > 0: + prev_elem = children[i - 1] + br_elem = prev_elem.find('.//{http://schemas.openxmlformats.org/wordprocessingml/2006/main}br') + if br_elem is not None: + break_type = br_elem.get('{http://schemas.openxmlformats.org/wordprocessingml/2006/main}type') + if break_type == 'page': + already_has_break = True + + if not already_has_break: + # 在该元素前插入分页符 + page_break = create_page_break_paragraph() + body.insert(i, page_break) + print(f" 已在'客户功能医学检测档案'前插入分页符") + break + + +# ============================================================ +# 主入口函数 +# ============================================================ + +def generate_and_fill_medical_intervention_v2(doc, abnormal_items: List[Dict], api_key: str, call_deepseek_api): + """ + 生成并填充医学干预建议(V2版本) + + 替代原有的 generate_functional_health_advice + fill_functional_health_advice_section + """ + if not api_key: + print(" ⚠️ 未提供DeepSeek API Key,跳过医学干预建议生成") + return None + + if not abnormal_items: + print(" ⚠️ 没有异常指标,跳过医学干预建议生成") + return None + + print("\n" + "=" * 60) + print("医学干预建议 V2") + print("=" * 60) + + # 生成内容 + intervention_result = generate_medical_intervention_v2(abnormal_items, api_key, call_deepseek_api) + + if intervention_result: + # 填充到文档 + print("\n 📝 正在填充医学干预建议...") + fill_medical_intervention_v2(doc, intervention_result) + print(" ✓ 医学干预建议完成") + else: + print(" ✗ 医学干预建议生成失败") + + return intervention_result + + +# ============================================================ +# 测试 +# ============================================================ + +if __name__ == '__main__': + # 测试分类功能 + test_items = [ + {'abb': 'TSH', 'name': '促甲状腺激素', 'result': '16.879', 'unit': 'μIU/mL', 'point': '↑', 'module': 'Thyroid Function'}, + {'abb': 'AMH', 'name': '抗缪勒管激素', 'result': '0.17', 'unit': 'ng/mL', 'point': '↓', 'module': 'Female Hormone'}, + {'abb': 'CRP', 'name': 'C反应蛋白', 'result': '5.2', 'unit': 'mg/L', 'point': '↑', 'module': 'Inflammatory Reaction'}, + {'abb': 'Hb', 'name': '血红蛋白', 'result': '105', 'unit': 'g/L', 'point': '↓', 'module': 'Complete Blood Count'}, + {'abb': 'DHEAS', 'name': '硫酸脱氢表雄酮', 'result': '45', 'unit': 'μg/dL', 'point': '↓', 'module': 'Female Hormone'}, + {'abb': 'FBS', 'name': '空腹血糖', 'result': '6.5', 'unit': 'mmol/L', 'point': '↑', 'module': 'Blood Sugar'}, + {'abb': 'TC', 'name': '总胆固醇', 'result': '6.2', 'unit': 'mmol/L', 'point': '↑', 'module': 'Lipid Profile'}, + ] + + categories = categorize_abnormal_items(test_items) + + print("=" * 60) + print("异常指标分类测试(24模块分类)") + print("=" * 60) + for module_name, items in categories.items(): + if items: + print(f"\n{module_name}: {len(items)} 项") + for item in items: + print(f" - {item['name']} ({item['abb']})") + + print("\n" + "=" * 60) + print("问题摘要:") + print("=" * 60) + print(build_problem_summary(categories)) diff --git a/backend/ocr_raw_chaonv2.txt b/backend/ocr_raw_chaonv2.txt new file mode 100644 index 0000000..63c08c5 --- /dev/null +++ b/backend/ocr_raw_chaonv2.txt @@ -0,0 +1,878 @@ +Health ASSURANCK +lac-MRA +National Healthcare Systems Co.,Ltd DMSC +2301/2 New Petchburi Road,Soi 47 (Soonvijai) Bangkapi, Huaykwang, Bangkok 10310, Thailand +Tel.(662)762-4000 Fax.- ISO15189 +Email:nhcslab@nhealth-asia.com Accreditation No.4120/47 +LABORATORY REPORT +Patient Name: MR. SHUNHU YU +Sex : Male DOB : 03 Jun 1968 Age : 57Y6M17D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor: un 46-25 +MRN.: 10A-25-213418 +Lab No.: 10253540676 Requested Date : 20 Dec 2025 18:36 +Collected Date/Time: 20 Dec 2025 +Received Date/Time : 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +(*)Complete Blood Count +Specimen......................: EDTA blood +Total WBC.....................: 7.84 *10^3/mm3 (4.00-10.00) +Red Blood Cells...............: 4.97 *10^6/mm3 (4.50-5.90) +Hemoglobin(Hb)................: 15.8 g/dL (13.0-18.0) +Hematocrit(HCT)...............: 46.6 (40.0-54.0) +Mean Cell Volume..............: 93.8 fL (80.0-100.0) +Mean Cell Hemoglobin..........: 31.8 pg (26.0-34.0) +Mean Cell Hb Concentration....: 33.9 g/dL (31.0-37.0) +RBC Distribution Width.......: 13.2 号 (9.0-15.0) +RBC Morphology................: No significant morphological abnormality seen. +WBC Differential +Neutrophils..................: 88.1H 号 (46.5-75.0) +6907 /mm3 (2000-7500) +Lymphocytes.................: 3.8L 号 (12.0-44.0) +298L /mm3 (1500-4000) +Monocytes...................: 5.4 号 (0.0-11.2) +423 /mm3 (200-1000) +Eosinophils..............: 2.2 (0.0-9.5) +172 /mm3 (40-700) +Basophil..................: 0.5 (0.0-2.5) +39 /mm3 (0-200) +Platelet Count..............: 155 10^3/mm3 (150-450) +Mean Platelet Volume..........: 10.3 fL (6.0-12.0) +Platelet Comment.............: Adequate +Remark:(H) means higher than reference values;(L) means lower than reference values;(*)ISO 15189 Accreditied +Reported by: Afnan Waedoloh,MT20602 on 20 Dec 2025 19:44 +Authorised by: Afnan Waedoloh,MT20602 on 20 Dec 2025 19:44 Print Date and Time : 20 Dec 2025 20:11 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen (S) received on the above date: Copyright issued by N Health."DO NOT COPY" +This report present test resultsfor infomational purposes and is no a substitute for medical advice, diagnosis, r treatment. Variability in specimen quality, test kit performance,reagent integrity, and the +sensitivity and specificity of instruments may influence the findings. While every efrt is made to ensure the accuracy and timeliness of the infomation provided, results should be considered in the context of evolving clinical guidelines and standards. Personal data is handled following applicable data privacy laws. +Issue date 10/06/2025 +FP-NLS-NHS-00-002/5 Revision 12 Page1/2 +Health SSURANCR +HC 9 +National Healthcare Systems Co., Ltd DMSC +2301/2 New Petchburi Road,Soi 47(Soonvijai) +Bangkapi,Huaykwang,Bangkok 10310,Thailand Tel.(662)762-4000 Fax.- ISO15189 +Email:nhcslab@nhealth-asia.com Accreditation No.4120/47 +LABORATORY REPORT +Patient Name: MR. SHUNHU YU +Sex : Male DOB : 03 Jun 1968 Age : 57Y6M17D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor:5n d +46-25 +MRN.:10A-25-213418 +LabNo.:10253540676 Requested Date :20 Dec 202518:36 +Collected Date/Time : 20 Dec 2025 +Received Date/Time: 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +(*)ESR +Specimen......................: EDTA +ESR 1 Hour ..................: 3 mm/hr (0-15) +Remark:(H)means higher than reference values;(L) means lower than reference values;(*)ISO 15189 Accreditied +Reported by: Kamonchat Pikulthong,MT23842 on 20 Dec 2025 20:10 +Authorised by: Kamonchat Pikulthong,MT23842 on 20 Dec 2025 20:10 Print Date and Time : 20 Dec 2025 20:11 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen (S) received on the above date: Copyright issued by N Health."DO NOT COPY" +This report present test results fr infomational purposes and is not a substitute for medical advice, diagnosis, r treatment Variability in specimen quality, test kit perfomance, eagent integrity, and the +sensitivity and specifcity of nstruments may influence the findings. While every effort is made to ensure he acuracy and timelines of the infomation provided, results should be considere in the context of evolving clinical guidelines and standards. Personal data is handled following applicable data privacy laws. +FP-NLS-NHS-00-002/5 Revision 12 Issue date 10/06/2025 +Page 2/2 +Health +National Healthcare Systems Co., Ltd +2301/2 New Petchburi Road,Soi 47(Soonvijai) +Bangkapi,Huaykwang,Bangkok 10310,Thailand Tel.(662)762-4000 Fax. - +Email:nhcslab@nhealth-asia.com +LABORATORY REPORT +Patient Name: MR. SHUNHU YU +Sex : Male DOB : 03 Jun 1968 Age : 57Y6M17D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor:En d +46-25 +MRN.: 10A-25-213418 +Lab No.:10253540676 Requested Date : 20 Dec 202518:36 +Collected Date/Time: 20 Dec 2025 +Received Date/Time: 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +ASO(Anti-Streptolysin O Titre) +Anti Streptolysin O Titre(ASO): Less than 200 IU/mL (N : Less than 200 IU/mL) +Remark:(H)means higher than reference values;(L) means lower than reference values;(*)ISO 15189 Accreditied +Reported by: Sarocha Manfak, MT16760 on 20 Dec 2025 20:00 +Authorised by: Nurhuda Suetoh,MT21824 on 20 Dec 2025 20:03 Print Date and Time : 20 Dec 2025 20:03 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen(S) received on the above date: Copyright issued by N Health."DO NOT COPY" +This report presentstet results frinformational purposes ad is not a substitute formedical advice, diagnosis, ortreatment Variability in specimen quality, test kit perfomance,reagent integrity, and the +sensitivity and specificity of instruments may influence the findings. While every effort is made to ensure the acuracy and timeliness of the information provided, results should be considered in the context of evolving clinical guidelines and standards. Personal data is handled following applicable data privacy laws. +Issue date 10/06/2025 +FP-NLS-NHS-00-002/5 Revision 12 Page 1/1 +Health dUALITY 55URANC +lac MRA +National Healthcare Systems Co., Ltd DMSC +2301/2 New Petchburi Road,Soi47(Soonvijai) +Bangkapi,Huaykwang,Bangkok 10310,Thailand ISO15189 +Tel.(662)762-4000 Fax.- +Email:nhcslab@nhealth-asia.com Accreditation No.4120/47 +LABORATORY REPORT +Patient Name: MR. SHUNHU YU +Sex : Male DOB: 03 Jun 1968 Age : 57Y6M17D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor: 46-25 +MRN.: 10A-25-213418 +Lab No.: 10253540676 Requested Date : 20 Dec 2025 18:36 +Collected Date/Time: 20 Dec 2025 +Received Date/Time: 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +(*)Urine Examination +Specimen......................: Urine +Physical Examination +Color.........................: Yellow [Normal: Yellow] +Transparency..................: Clear [Normal : Clear] +Specific Gravity............: 1.020 (1.003-1.030) +pH............................: 8.5H (4.5-8.0) +Chemical Examination +Protein.......................: Negative [Normal : Negative] +Glucose.......................: Negative [Normal : Negative] +Ketone........................: Negative [Normal : Negative] +Bilirubin.....................: Negative [Normal : Negative] +Erythrocyte...................: Negative [Normal: Negative] +Urobilinogen..................: Negative [Normal : Negative] +Nitrite.......................: Negative [Normal : Negative] +Leucocyte.....................: Negative [Normal : Negative] +Urine Sedimentation +WBC...........................: 0-1Cells/HPF [Normal:0-5 Cells/HPF] +RBC...........................: 0-1 Cells/HPF [Normal: 0-2 Cells/HPF] +Squamous Epithelial Cell....: 0-1 Cells/HPF [Normal: 0-5 Cells/HPF] +Remark:(H) means higher than reference values;(L) means lower than reference values;(*)ISO 15189 Accreditied +Reported by: Kamonchat Pikulthong,MT23842 on 20 Dec 2025 19:47 +Authorised by: Kamonchat Pikulthong,MT23842 on 20 Dec 2025 19:47 Print Date and Time: 20 Dec 2025 19:48 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen (S) received on the above date: Copyright issued by N Health."DO NOT COPY" +This report present test resultsfor infomational purposes and is nota substitute for medical advice, diagnosis, r treatment. Variability in specimen quality, test kit performane,reagent integrity, and the +sensitivity and specificity of instruments may influence the findings. While every efrtis made to ensure the accuracy and timeliness of the infomation provided,results should be considered in the +context of evolving clinical guidelines and standards. Personal data is handled following applicable data privacy laws. +Issue date 10/06/2025 +FP-NLS-NHS-00-002/5 Revision 12 Page 1/1 +Health 0 SSURANCA +lac-MRA +National Healthcare Systems Co., Ltd DMSC +2301/2 New Petchburi Road,Soi 47(Soonvijai) +Bangkapi,Huaykwang,Bangkok 10310,Thailand Tel.(662)762-4000 Fax.- ISO 15189 +Email:nhcslab@nhealth-asia.com Accreditation No.4120/47 +LABORATORY REPORT +Patient Name : MR. SHUNHU YU +Sex: Male DOB : 03 Jun 1968 Age : 57Y6M17D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor: 5n d +46-25 +MRN.:10A-25-213418 +Lab No.:10253540676 Requested Date : 20 Dec 2025 18:36 +Collected Date/Time : 20 Dec 2025 +Received Date/Time : 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +(*)Thrombin Time +Specimen type..............: 3.2 % Na citrate plasma +Thrombin Time(TT)..........: 16.8 Secs. (15.8-19.0) +Remark:(H) means higher than reference values;(L) means lower than reference values;(*)ISO 15189 Accreditied +Reported by: Afnan Waedoloh,MT20602 on 20 Dec 2025 20:11 +Authorised by: Afnan Waedoloh,MT20602 on 20 Dec 2025 20:11 Print Date and Time : 20 Dec 2025 20:12 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen (S) received on the above date: Copyright issued by N Health."DO NOT COPY" +Thi report present test results fr infomational purposes ad is not a substitute formedical advice, diagnosis, ortreatment Variability in specimen quality, test kit perfomane, reagent integrity, an the +sensitivity and specificity of instruments may influence the findings. While every efrtis made to ensure the accuracy and timeliness of the infomation provided, results should be considered in the context of evolving clinical guidelines and standards. Personal data is handled following applicable data privacy laws. +Issue date 10/06/2025 +FP-NLS-NHS-00-002/5 Revision 12 Page 1/4 +Health Q BSRARON +ilaCMRA +National Healthcare Systems Co., Ltd DMSC +2301/2 New Petchburi Road,Soi47(Soonvijai) +Bangkapi,Huaykwang,Bangkok 10310,Thailand Tel.(662)762-4000 Fax.- ISO15189 +Email:nhcslab@nhealth-asia.com Accreditation No.4120/47 +LABORATORY REPORT +Patient Name: MR. SHUNHU YU +Sex : Male DOB : 03 Jun 1968 Age : 57Y6M17D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor: n 46-25 +MRN.: 10A-25-213418 +Lab No. : 10253540676 Requested Date : 20 Dec 2025 18:36 +Collected Date/Time: 20 Dec 2025 +Received Date/Time : 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +(*)Lipid set (Chol, Tri, HDL, LDL) +Specimen......................: Serum +Cholesterol...................: 175 mg/dL (<200) +Triglyceride..................: 89 mg/dL (<150) +HDL-Cholesterol...............: 52 mg/dL (>40) +LDL Direct....................: 99 mg/dL (<130) +VLDL..........................: 18 mg/dL (<30) +Cholesterol/HDL-C Ratio.......: 3.4 +LDL/HDL Ratio. 1.90 +Normal Range LDL " The American Heart Association (AHA) recommends : +Optimal : <100 mg/dl +Near Optimal :100-129 mg/d1 +Borderline high :130-159 mg/dl +High :160-189 mg/dl +Very high :>=190 mg/d1 +For 10-year ASCVD risk score >= 7.5 % or cardiovascular diseases +Or diabetes mellitus, LDL target should be 70 mg/dl. or lower" +(*)Liver Function Test (8 Tests) +Total Protein.................: 7.24 g/dL (6.40-8.30) +Albumin.......................: 5.07 g/dL (3.50-5.20) +Globulin......................: 2.17 g/dL (2.10-3.70) +Bilirubin(Total)............: 0.6 mg/dL (0.2-1.2) +Bilirubin (Direct)..........: 0.2 mg/dL (0.0-0.5) +ALP(Alkaline Phosphatase).....: 57 U/L (40-150) +AST(Aspartate Transaminase)...: 30 U/L (5-34) +ALT(Alanine Transaminase).....: 41 U/L (0-45) +Remark:(H) means higher than reference values;(L) means lower than reference values;(*)ISO 15189 Accreditied +Reported by: (Auto)Warunee Jit.,MT7166 on 20 Dec 2025 20:10 +Authorised by: (Auto) Warunee Jit.,MT7166 on 20 Dec 2025 20:11 Print Date and Time : 20 Dec 2025 20:12 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen (S) received on the above date: Copyright issued by N Health."DO NOT COPY" +This rport present test results fr infomational purposes and is nota substitute for medical advice, diagnosis, r treatment. Variability in specimen quality, tes kt prformance, reagent integrity, and the +sensitivity and specifcity of instruments may influence the findings. While every efotis made to esure the accuracy and timeliness of the information provided,results should be considered in the context of evolving clinical guidelines and standards.Personal data is handled following applicable data privacy laws. +FP-NLS-NHS-00-002/5 Revision 12 Issue date 10/06/2025 +Page2/4 +Health SSURANOK +lac-MRA +National Healthcare Systems Co., Ltd DMSC +2301/2New Petchburi Road,Soi 47(Soonvijai) +Bangkapi,Huaykwang,Bangkok 10310,Thailand Tel.(662)762-4000 Fax.- ISO 15189 +Email:nhcslab@nhealth-asia.com Accreditation No.4120/47 +LABORATORY REPORT +Patient Name : MR. SHUNHU YU +Sex : Male DOB : 03 Jun 1968 Age: 57Y6M17D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor:5n 46-25 +MRN.:10A-25-213418 +LabNo.:10253540676 Requested Date : 20 Dec 202518:36 +Collected Date/Time : 20 Dec 2025 +Received Date/Time : 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +(*)Fibrinogen Level +Fibrinogen Level.......: 228.2 mg/dL (200.0-400.0) +Remark:(H)means higher than reference values;(L) means lower than reference values;(*)ISO 15189 Accreditied +Reported by : Afnan Waedoloh,MT20602 on 20 Dec 2025 20:11 +Authorised by: Afnan Waedoloh,MT20602 on 20 Dec 2025 20:11 Print Date and Time : 20 Dec 2025 20:12 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen (S) received on the above date: Copyright issued by N Health."DO NOT COPY" +This report present test results fr infomational purposes and is not a substitute formedical advice, diagnosis, r treatment Variability in specimen quality, test kit performance,reagent integrity, an the +sensitivity and specificity of instruments may influence the findings. While every fort is made to ensure the accuracy and timeliness of the infomation provided, results should be considered in the context of evolving clinical guidelines and standards. Personal data is handled following applicable data privacy laws. +Issue date 10/06/2025 +FP-NLS-NHS-00-002/5 Revision 12 Page 3/4 +Health 655URANO +laCMEA +National Healthcare Systems Co., Ltd DMS +2301/2 New Petchburi Road,Soi 47(Soonvijai) +Bangkapi, Huaykwang, Bangkok 10310,Thailand ISO 15189 +Tel.(662)762-4000 Fax.- +Email:nhcslab@nhealth-asia.com Accreditation No.4120/47 +LABORATORY REPORT +Patient Name : MR. SHUNHU YU +Sex : Male DOB : 03 Jun 1968 Age : 57Y6M17D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor:n 46-25 +MRN.:10A-25-213418 +Lab No.:10253540676 Requested Date : 20 Dec 2025 18:36 +Collected Date/Time : 20 Dec 2025 +Received Date/Time : 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +(*)C-Reactive Protein High Sens. +C-Reactive Protein(High Sens)....: 0.98 mg/L (0.00-5.00) +Reference Range...........: Coronary vascular disease risk assessment +according to AHA/CDC recommendation. +<1.0 mg/L Low risk +1.0-3.0 mg/L Average risk +>3.0 mg/L High risk +Remark:(H) means higher than reference values;(L) means lower than reference values;(*)ISO 15189 Accreditied +Reported by:(Auto)Warunee Jit.,MT7166 on 20 Dec 2025 20:10 +Authorised by: (Auto) Warunee Jit.,MT7166 on 20 Dec 2025 20:11 Print Date and Time : 20 Dec 2025 20:12 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen (S) received on the above date: Copyright issued by N Health."DO NOT COPY" +This eport present test results or infomational purposes and is not a substitute for medical advice, diagnosis, r treatment Variability in specimen quality, test kit perfomane,reagent integrity, and the +sensitivity and specifcity ofinstruments may influence the findings. While every effort s madeto ensure the acuracy and timeliess of the information provided, results should be considered in the context of evolving clinical guidelines and standards.Personal data is handled following applicable data privacy laws. +Issue date 10/06/2025 +FP-NLS-NHS-00-002/5 Revision 12 Page 4/4 +N +Health 一 SSURANC +lac MRA +National Healthcare Systems Co., Ltd DMSC +2301/2 New Petchburi Road,Soi 47(Soonvijai) +Bangkapi,Huaykwang, Bangkok 10310, Thailand ISO 15189 +Tel.(662)762-4000 Fax.- +Email:nhcslab@nhealth-asia.com Accreditation No.4120/47 +LABORATORY REPORT +Patient Name : MR. SHUNHU YU +Sex: Male DOB : 03 Jun 1968 Age : 57Y6M17D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor:uwn d +46-25 +MRN.:10A-25-213418 +Lab No.:10253540676 Requested Date : 20 Dec 2025 18:36 +Collected Date/Time : 20 Dec 2025 +Received Date/Time : 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +(*)Prothrombin Time +Specimen type.................: 3.2 % Na citrate plasma +Prothrombin Time(PT)........: 10.6 Secs. (10.1-12.3) +INR 0.93 +PT %........................: 116.5 号 (50.0-150.0) +Remark:(H)means higher than reference values;(L) means lower than reference values;(*)ISO 15189 Accreditied +Reported by : Afnan Waedoloh,MT20602 on 20 Dec 2025 20:11 +Authorised by: Afnan Waedoloh,MT20602 on 20 Dec 2025 20:11 Print Date and Time : 20 Dec 2025 20:11 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen (S) received on the above date: Copyright issued by N Health."DO NOT COPY" +This report present test results fr infomational purposes and is not a substitute for medical advice, diagnosis, r treatment. Variability in specimen quality, test kit performance,reagent integrity, and the +sensitivity and specificity of instruments may influence the findings. While every ffort is made to ensure the accuracy and timeliness of the infomation provided, results should be considered in the context of evolving clinical guidelines and standards. Personal data is handled following applicable data privacy laws. +Issue date 10/06/2025 +FP-NLS-NHS-00-002/5 Revision 12 Page 1/2 +Health SSURANO +ilac MRA +National Healthcare Systems Co., Ltd DMSC +2301/2 New Petchburi Road,Soi47(Soonvijai) +Bangkapi, Huaykwang, Bangkok 10310, Thailand Tel.(662)762-4000 Fax.- ISO 15189 +Email:nhcslab@nhealth-asia.com Accreditation No.4120/47 +LABORATORY REPORT +Patient Name: MR. SHUNHU YU +Sex : Male DOB : 03 Jun 1968 Age : 57Y6M17D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor:n 46-25 +MRN.:10A-25-213418 +Lab No.:10253540676 Requested Date : 20 Dec 2025 18:36 +Collected Date/Time : 20 Dec 2025 +Received Date/Time : 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +(*)Partial Thromboplastin Time(PTT) +Specimen........................: Sodium Citrate plasma +Specimen type.................: 3.2 % Na citrate plasma +Partial Thromboplastin Time(PTT).: 31.6 H Secs. (22.5-31.5) +Mean of Reference Range........: 26.96Secs. +Remark:(H) means higher than reference values;(L) means lower than reference values;(*)ISO 15189 Accreditied +Reported by : Afnan Waedoloh,MT20602 on 20 Dec 2025 20:11 +Authorised by: Afnan Waedoloh,MT20602 on 20 Dec 2025 20:11 Print Date and Time : 20 Dec 2025 20:11 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen (S) received on the above date: Copyright issued by N Health."DO NOT COPY" +This report present test results fr infomational purposes and is nota ubstitute for medical advice, diagnosis, r treatment. Variability in specimen quality, test kit performance,reagent integrity, and the +sensitivity and specificity of instruments may influence the findings. While every efort is made to ensure the accuracy and timeliness of the infomation provided, results should be considered in the context of evolving clinical guidelines and standards.Personal data is handled following applicable data privacy laws. +Issue date 10/06/2025 +FP-NLS-NHS-00-002/5 Revision 12 Page 2/2 +Health Q S5URANOA +ac MRA +National Healthcare Systems Co.,Ltd DMS +2301/2 New Petchburi Road,Soi 47(Soonvijai) +Bangkapi,Huaykwang,Bangkok 10310,Thailand Tel.(662)762-4000 Fax.- ISO15189 +Email:nhcslab@nhealth-asia.com Accreditation No.4120/47 +LABORATORY REPORT +Patient Name: MR. SHUNHU YU +Sex: Male DOB : 03 Jun 1968 Age : 57Y6M17D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor:n 46-25 +MRN.:10A-25-213418 +LabNo.:10253540676 Requested Date :20 Dec 2025 18:36 +Collected Date/Time: 20 Dec 2025 +Received Date/Time : 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +(*)Apolipoprotein B +Apolipoprotein B.........: 81.70 mg/dL (49.00-173.00) +Remark:(H)means higher than reference values;(L) means lower than reference values;(*)ISO 15189 Accreditied +Reported by: Sarocha Manfak, MT16760 on 20 Dec 2025 20:13 +Authorised by: Sarocha Manfak, MT16760 on 20 Dec 2025 20:14 Print Date and Time : 20 Dec 2025 20:14 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen (S) received on the above date: Copyright issued by N Health."DO NOT COPY" +This report present test results frinfomational purposes and is not a substitute for medical advice, diagnosis, ortreatment Variability in specimen quality, test kit perfomane, eagent integrity, and the +sensitivity and specificity of instruments may influence the findings. While every efort is made to ensure the accuracy and timelinessoftheinformation provided, results should be considered in the +context of evolving clinical guidelines and standards. Personal data is handled following applicable data privacy laws. +Issue date 10/06/2025 +FP-NLS-NHS-00-002/5 Revision 12 Page 1/1 +Health QuALete 55URANC +ilac-MRA +National Healthcare Systems Co.,Ltd DMSC +2301/2 New Petchburi Road,Soi 47(Soonvijai) +Bangkapi,Huaykwang,Bangkok 10310,Thailand Tel.(662)762-4000 Fax. - ISO15189 +Email:nhcslab@nhealth-asia.com Accreditation No. 4120/47 +LABORATORY REPORT +Patient Name: MR. SHUNHU YU +Sex : Male DOB : 03 Jun 1968 Age : 57Y6M17D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor: n d +46-25 +MRN.: 10A-25-213418 +Lab No.: 10253540676 Requested Date : 20 Dec 2025 18:36 +Collected Date/Time: 20 Dec 2025 +Received Date/Time: 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +(*)Glucose(Fasting) +Specimen.............: Serum +Glucose(Fasting)........: 89 mg/dL (70-99) +(*)Glycated Hb(HbA1C) +Specimen.....................: EDTA blood +Haemoglobin Alc............: 5.1 号 (<5.7) +Estimated Average Glucose.....: 100 mg/dL +(*)Blood urea nitrogen +Specimen.................: Serum +Blood Urea Nitrogen........: 13.10 mg/dL (8.40-25.70) +(*)Creatinine(plus eGFR) +Specimen....................: Serum +Creatinine....................: 0.76 mg/dL (0.73-1.18) +eGFR (African-American)......: 117.38 ml/min/1.73m2 +eGFR (Non African-American)...: 101.28 ml/min/1.73 m2 +eGFR for Thai.................: 108.78 ml/min/1.73 m2 +eGFR.........................: 104.84 ml/min/1.73m2 +eGFR Comment : Calculated by CKD- EPI formula according to +National kidney foundation recommendation. +(2021) This equation should only be used for patients 18 and older. +(*)uric acid +Specimen................: Serum +Uric Acid.................: 6.3 mg/dL (3.5-7.2) +Remark:(H) means higher than reference values;(L) means lower than reference values;(*)ISO 15189 Accreditied +Reported by: (Auto)Warunee Jit.,MT7166 on 20 Dec 2025 20:10 +Authorised by: (Auto) Warunee Jit.,MT7166 on 20 Dec 2025 20:11 Print Date and Time: 20 Dec 2025 20:22 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen (S) received on the above date: Copyright issued by N Health."DO NOT COPY" +sensitivity and specificity of instruments may influence the findings. While every efort is made to ensure the accuracy and timeliness of the infomation provided,results should be considered in the context of evolving clinical guidelines and standards. Personal data is handled following applicable data privacy laws. +FP-NLS-NHS-00-002/5 Revision 12 Issue date 10/06/2025 +Page 1/2 +Health S5URANCK +lac-MRA +National Healthcare Systems Co.,Ltd DMSC +2301/2 New Petchburi Road,Soi 47(Soonvijai) +Bangkapi,Huaykwang,Bangkok 10310,Thailand Tel.(662)762-4000 Fax. - ISO15189 +Email:nhcslab@nhealth-asia.com Accreditation No. 4120/47 +LABORATORY REPORT +Patient Name: MR. SHUNHU YU +Sex : Male DOB : 03 Jun 1968 Age: 57Y6M17D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor: 5 d +46-25 +MRN.: 10A-25-213418 +Lab No.: 10253540676 Requested Date : 20 Dec 2025 18:36 +Collected Date/Time: 20 Dec 2025 +Received Date/Time: 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +(*)GGT( gamma GT) +Specimen...............: Serum +Gamma Glutamyl Transferase...: 25 U/L (12-64) +(*)Inorganic phosphate +Specimen.....................: Serum +Inorganic phosphate........: 3.4 mg/dL (2.5-4.5) +(*)Magnesium +Specimen....................: Serum +Magnesium(Mg)............: 2.3 mg/dL (1.6-2.6) +(*)Electrolytes +Specimen......................: Serum +Sodium......................: 138 mmol/L (136-145) +Potassium.....................: 4.80 mmol/L (3.50-5.10) +Chloride.....................: 105 mmol/L (98-107) +TCO2..........................: 25 mmol/L (22-29) +Anion gap ...................: 8.0L mmol/L (10.0-12.0) +Remark:(H) means higher than reference values;(L) means lower than reference values;(*)ISO 15189 Accreditied +Reported by: (Auto)Warunee Jit.,MT7166 on 20 Dec 2025 20:21 +Authorised by: (Auto) Warunee Jit.,MT7166 on 20 Dec 2025 20:22 Print Date and Time: 20 Dec 2025 20:22 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen (S) received on the above date: Copyright issued by N Health."DO NOT COPY" +This report present test results fr infomational purposes ad is not a substitute for medical advice, diagnosis, o treatment Variability in specimen qualty, test kit perfomane, reagent integrity, an the +sensitivity and specificity of instruments may influence the findings. While every fort is made to ensure the accuracy and timeliness of the infomation provided, results should be considered in the context of evolving clinical guidelines and standards.Personal data is handled following applicable data privacy laws. +FP-NLS-NHS-00-002/5 Revision 12 Issue date 10/06/2025 +Page 2/2 +0 S5URANO +Health +ilac MRA +National Healthcare Systems Co., Ltd DMSC +2301/2 New Petchburi Road, Soi 47 (Soonvijai) +Bangkapi, Huaykwang, Bangkok 10310, Thailand Tel.(662)762-4000 Fax. - ISO 15189 +Email:nhcslab@nhealth-asia.com Accreditation No.4120/47 +LABORATORY REPORT +Patient Name: MR. SHUNHU YU +Sex : Male DOB : 03 Jun 1968 Age : 57Y6M17D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor:n , 46-25 +MRN.:10A-25-213418 +Lab No.:10253540676 Requested Date :20 Dec 2025 18:36 +Collected Date/Time : 20 Dec 2025 +Received Date/Time : 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +(*)Lipoprotein(a) +Lipoprotein(a).........: 4.82 mg/dL (<30.00) +Method.................: Immunoturbidimetric assay +Remark:(H) means higher than reference values;(L) means lower than reference values;(*)ISO 15189 Accreditied +Reported by: Nurhuda Suetoh,MT21824 on 20 Dec 2025 20:36 +Authorised by: Nurhuda Suetoh,MT21824 on 20 Dec 2025 20:36 Print Date and Time: 20 Dec 2025 20:36 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen (S) received on the above date: Copyright issued by N Health."DO NOT COPY" +This report presents testresults forinformational purposes and is not a substitute formedical advice, diagnosis, r treatment. Variability in specimen quality, test kit prformance, reagent integrity, and the +sensitivity and specificity of instruments may influence the findings. While every efort is made to ensure the accuracy and timeliness of the information provided,results should be considered in the +context of evolving clinical guidelines and standards. Personal data is handled following applicable data privacy laws. +Issue date 10/06/2025 +FP-NLS-NHS-00-002/5 Revision 12 Page 1/1 +Health +lkC 6 +National Healthcare Systems Co., Ltd DMSc +2301/2 New Petchburi Road,Soi47(Soonvijai) +Bangkapi,Huaykwang, Bangkok 10310, Thailand ISO15189 +Tel.(662)762-4000 Fax.- Accreditation No.4120/47 +Email:nhcslab@nhealth-asia.com +LABORATORY REPORT +Patient Name: MR. SHUNHU YU +Sex: Male DOB : 03 Jun 1968 Age : 57Y6M17D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor: 46-25 +MRN.:10A-25-213418 +Lab No.:10253540676 Requested Date : 20 Dec 2025 18:36 +Collected Date/Time : 20 Dec 2025 +Received Date/Time : 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +(*)IgG(Immunoglobulin G) Level +Immunoglobulin G(IgG)......: 1085.0 mg/dL (700.0-1600.0) +Remark:(H) means higher than reference values;(L) means lower than reference values;(*)ISO 15189 Accreditied +Reported by: (Auto)Warunee Jit.,MT7166 on 20 Dec 2025 20:41 +Authorised by: (Auto)Warunee Jit.,MT7166 on 20 Dec 2025 20:42 Print Date and Time : 20 Dec 2025 20:42 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen (S) received on the above date: Copyright issued by N Health."DO NOT COPY" This report presents test results for informational purposes and is not a substitute for medical advice, dignosis or treatment. Variability in specimen quality, est it perfomance, reagent integrity, and the +sensitivity and specificity of instruments may influence he findings. While every fort is made to ensure the accuracy and timeliness of the infomation provided, esults should be considered in the +context of evolving clinical guidelines and standards. Personal data is handled following applicable data privacy laws. +Issue date 10/06/2025 +FP-NLS-NHS-00-002/5 Revision 12 Page 1/2 +SUKAN +Health +lac MRA +National Healthcare Systems Co., Ltd DMSC +2301/2 New Petchburi Road, Soi 47 (Soonvijai) +Bangkapi, Huaykwang, Bangkok 10310, Thailand ISO 15189 +Tel.(662)762-4000 Fax.- Accreditation No.4120/47 +Email:nhcslab@nhealth-asia.com +LABORATORY REPORT +Patient Name: MR. SHUNHU YU +Sex: Male DOB : 03 Jun 1968 Age : 57Y6M17 D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor: 46-25 +MRN.: 10A-25-213418 +Lab No.:10253540676 Requested Date : 20 Dec 202518:36 +Collected Date/Time : 20 Dec 2025 +Received Date/Time: 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +(*)IgA(Immunoglobulin A) Level +Immunoglobulin A(IgA)........: 125.00mg/dL (70.00-400.00) +(*)Complement C3(B1C) +Complement C3(B1C).........: 107.0 mg/dl (90.0-180.0) +(*)ComplementC4 +Complement C4 .............: 19.60 mg/dL (10.00-40.00) +Remark:(H)means higher than reference values;(L) means lower than reference values;(*)ISO 15189 Accreditied +Reported by : (Auto) Warunee Jit.,MT7166 on 20 Dec 2025 20:41 +Authorised by: (Auto)Warunee Jit.,MT7166 on 20 Dec 2025 20:42 Print Date and Time : 20 Dec 2025 20:42 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen (S) received on the above date: Copyright issued by N Health."DO NOT COPY" This reportpresents test results for informationa purposes and is not a substitute for medical advice, dignosis,or treatment. Variability in specimen quality, tet it performance, reagent integrity, and the +context of evolving clinical guidelines and standards. Personal data is handled following applicable data privacy laws. +Issue date 10/06/2025 +FP-NLS-NHS-00-002/5 Revision 12 Page 2/2 +0 +Health +lac-MRA +National Healthcare Systems Co., Ltd DMSC +2301/2 New Petchburi Road, Soi 47 (Soonvijai) +Bangkapi,Huaykwang,Bangkok 10310,Thailand ISO15189 +Tel.(662)762-4000 Fax.- Accreditation No.4120/47 +Email:nhcslab@nhealth-asia.com +LABORATORY REPORT +Patient Name: MR. SHUNHU YU +Sex : Male DOB : 03 Jun 1968 Age : 57Y6M17D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor: n d 46-25 +MRN.: 10A-25-213418 +Lab No.: 10253540676 Requested Date : 20 Dec 2025 18:36 +Collected Date/Time: 20 Dec 2025 +Received Date/Time : 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +(*)Triiodothyronine(T3) +Total T3(Triiodothyronine)...: 97 ng/dL (57-152) +T3 Reference Range : +Female/Male 4 days-364 days 85-234 ng/dL +Female/Male 1-11 years 113-189 ng/dL +Female/Male 12-14 years 98-176 ng/dL +Female 15-16 years 92-142 ng/dL +Male 15-16 years 94-156 ng/dL +Female/Male 17-18 years 90-168 ng/dL +Female >19 years 57-152 ng/dL +Male >19 years 57-152 ng/dL +(*)Thyroxine(T4) +Total T4(Thyroxine).........: 6.39 ug/dL (4.87-11.72) +T4 Reference Range : +Female/Male 7 days-364 days 5.87-13.67 ug/dL +Female/Male 1-8years 6.16-10.32 ug/dL +Female/Male 9-11 years 5.48-9.31 ug/dL +Female 12-13 years 5.08-8.34 ug/dL +Female 14-18 years 5.46-12.99 ug/dL +Male 12-13 years 5.01-8.28 ug/dL +Male 14-18 years 4.68-8.62 ug/dL +Female/Male >19 years 4.87-11.72 ug/dL +(*)TSH(Thyroid Stimulating Hormone) +Specimen.....................: Serum +TSH..........................: 0.528 uIU/mL (0.350-4.940) +TSH Reference Range : +Female/Male 4 days-179 days 0.73-4.77 uIU/mL +Remark:(H) means higher than reference values;(L) means lower than reference values;(*)ISO 15189 Accreditied +Reported by: (Auto)Warunee Jit.,MT7166 on 20 Dec 2025 20:40 +Authorised by: (Auto) Warunee Jit.,MT7166 on 20 Dec 2025 20:41 Print Date and Time : 20 Dec 2025 20:46 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen (S) received on the above date: Copyright issued by N Health."DO NOT COPY" +This report presents et results fr informational purposes and is nota substitute for medical advice, diagnosis,or treatment. Variability in specimen quality, est it performance, reagent integrity, and the +sensitivity and specificity of instruments may influence thefindings. While every ffort is made to ensure the accuracy and timeliness of the infomation provided, results should be considered in the +context of evolving clinical guidelines and standards. Personal data is handled following applicable data privacy laws. +Issue date 10/06/2025 +FP-NLS-NHS-00-002/5 Revision 12 Page 1/6 +55URANCK +Health +lac-MRA +National Healthcare Systems Co., Ltd DMSC +2301/2 New Petchburi Road, Soi 47 (Soonvijai) +Bangkapi, Huaykwang, Bangkok 10310, Thailand ISO15189 +Tel.(662)762-4000 Fax.- Accreditation No.4120/47 +Email:nhcslab@nhealth-asia.com +LABORATORY REPORT +Patient Name: MR. SHUNHU YU +Sex : Male DOB : 03 Jun 1968 Age : 57Y6M17D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor: n d 46-25 +MRN.:10A-25-213418 +Lab No.:10253540676 Requested Date : 20 Dec 2025 18:36 +Collected Date/Time: 20 Dec 2025 +Received Date/Time : 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +Female/Male 180 days-13 years 0.70-4.17 uIU/mL +Female/Male 14-19 years 0.47-3.41 uIU/mL +Female/Male > 20 years 0.35-4.94 uIU/mL +(*)Triiodothyronine Free(Free T3) +Specimen......................: Serum +Free T3(Free Triiodothyronine): 2.64 pg/mL (1.58-3.91) +FT3 Reference Range : +Female/Male 4 days-364 days 2.32-4.87 pg/mL +Female/Male 1-11 years 2.79-4.42pg/mL +Female 12-14 years 2.50-3.95 pg/mL +Female 15-18 years 2.31-3.71 pg/mL +Male 12-14 years 2.89-4.33 pg/mL +Male 15-18 years 2.25-3.85 pg/mL +Female/Male >19 years 1.58-3.91 pg/mL +(*)Thyroxine Free (Free T4) +Specimen......................: Serum +Free T4 (Free Thyroxine)......: 0.84 ng/dL (0.70-1.48) +FT4 Reference Range: +Female/Male 5 days-14 days 1.05-3.21 ng/dL +Female/Male 15 days-29 days 0.68-2.53 ng/dL +Female/Male 30 days -364 days 0.89-1.7 ng/dL +Female/Male 1-18 years 0.89-1.37 ng/dL +Female/Male > 19 years 0.7-1.48 ng/dL +(*)Alpha Fetoprotein(AFP) +AFP(Alpha Fetoprotein)......: 2.49 ng/mL (0.89-8.78) +Remark:(H) means higher than reference values;(L) means lower than reference values;(*)ISO 15189 Accreditied +Reported by:(Auto) Warunee Jit.,MT7166 on 20 Dec 2025 20:40 +Authorised by: (Auto) Warunee Jit.,MT7166 on 20 Dec 2025 20:41 Print Date and Time : 20 Dec 2025 20:46 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen (S) received on the above date: Copyright issued by N Health."DO NOT COPY" +ensitivity and specificity of instruments may influence the findings. While every ffort is made to ensure the accuracy and timeliness of the infomation provided, results should be considered in the +context of evolving clinical guidelines and standards. Personal data is handled following applicable data privacy laws. +Issue date 10/06/2025 +FP-NLS-NHS-00-002/5 Revision 12 Page2/6 +Q 6 55URANC +Health +lac MRA +National Healthcare Systems Co., Ltd DMSC +2301/2 New Petchburi Road, Soi 47 (Soonvijai) +Bangkapi, Huaykwang, Bangkok 10310, Thailand ISO 15189 +Tel.(662)762-4000 Fax.- Accreditation No.4120/47 +Email:nhcslab@nhealth-asia.com +LABORATORY REPORT +Patient Name : MR. SHUNHU YU +Sex : Male DOB : 03 Jun 1968 Age : 57Y6M17D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor:n 。, d 46-25 +MRN.:10A-25-213418 +LabNo.:10253540676 Requested Date : 20 Dec 2025 18:36 +Collected Date/Time : 20 Dec 2025 +Received Date/Time : 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +(*)Carcinoembryonic Antigen(CEA) +CEA(Carcinoembryonic Antigen).: 0.90 ng/mL (0.00-5.00) +(*)Carbohydrate Antigen 19-9(Digestive Tract) +Carbohydrate Antigen 19-9.....: 0.98 U/mL (0.00-37.00) +(*)Ferritin +Ferritin......................: 399.6 Hng/mL (15.0-200.0) +(*)Estradiol(E2) +Specimen......................: Serum +Estradiol(E2).................: <20.0 pg/mL +Normal Menstruating Females +Follicular Phase 21-251 pg/mL +Mid-Cycle Phase 38-649 pg/mL +Luteal Phase 21- 312 pg/mL +Postmenopausal Females not on HRT <28 pg/mL +Postmenopausal Females on HRT* <144 pg/mL +Males<44 pg/mL +(*)Luteinizing Hormone(LH) +LH(Luteinizing Hormone).....: 4.79 mIU/mL +Reference Range: Male 0.57-12.07 +Female +Follicular Phase 1.80-11.78 +Remark:(H) means higher than reference values;(L)means lower than reference values;(*)ISO 15189 Accreditied +Reported by : Nurhuda Suetoh,MT21824 on 20 Dec 2025 20:45 +Authorised by: Nurhuda Suetoh,MT21824 on 20 Dec 2025 20:46 Print Date and Time : 20 Dec 2025 20:46 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen (S) received on the above date: Copyright issued by N Health."DO NOT COPY" +sensitivity and specificity of instruments may influence thefindings. While every ffort is made to ensure the accuracy and timeliness of the information provided, results should be considered in the +context of evolving clinical guidelines and standards. Personal data is handled following applicable data privacy laws. +Issue date 10/06/2025 +FP-NLS-NHS-00-002/5 Revision 12 Page 3/6 +Q S5URANC +Health +lacMRA +National Healthcare Systems Co., Ltd DMSc +2301/2 New Petchburi Road,Soi 47(Soonvijai) +Bangkapi, Huaykwang, Bangkok 10310, Thailand ISO15189 +Tel.(662)762-4000 Fax.- Accreditation No. 4120/47 +Email:nhcslab@nhealth-asia.com +LABORATORY REPORT +Patient Name: MR. SHUNHU YU +Sex: Male DOB : 03 Jun 1968 Age : 57Y6M17D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor:n d 46-25 +MRN.: 10A-25-213418 +Lab No.:10253540676 Requested Date :20 Dec 202518:36 +Collected Date/Time : 20 Dec 2025 +Received Date/Time: 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +Mid-cycle Peak 7.59-89.08 +Luteal Phase 0.56-14.00 +Postmenopausal Phase 5.16-61.99 +(*)Progesterone +Progesterone..................: <0.20 ng/mL +Reference Range...............: Normal Menstruating Females +Follicular Phase <0.3 ng/mL +Luteal Phase 1.2-15.9 ng/mL +Postmenopausal Females <0.2 ng/mL +Pregnant Females +First Trimester 2.8-147.3 ng/mL +Second Trimester 22.5 - 95.3 ng/mL +Third Trimester 27.9 - 242.5 ng/mL +Male<0.2 ng/mL +(*)Folicle Stimulating Hormone(FSH) +Specimen......................: Serum +FSH...........................: 6.69 mIU/mL +Reference Range: Male 0.95-11.95 +Female +Follicular Phase 3.03-8.08 +Mid-cycle Peak 2.55-16.69 +Luteal Phase 1.38-5.47 +Postmenopausal Phase 26.72-133.41 +Remark:(H) means higher than reference values;(L)means lower than reference values;(*)ISO 15189 Accreditied +Reported by: Nurhuda Suetoh,MT21824 on 20 Dec 2025 20:45 +Authorised by: Nurhuda Suetoh,MT21824 on 20 Dec 2025 20:46 Print Date and Time : 20 Dec 2025 20:46 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen(S) received on the above date: Copyright issued by N Health."DO NOT COPY" +This report present test results or infomational purposes and is nota substitute for medical advice, diagnosis, r treatment. Variability in specimen quality, et it performance, reagent integrity, and the +sensitivity and specificity of instruments may influence the findings. While ever ffort is made to ensure the accuracy and timeliness of the information provided,results should be considered in the +context of evolving clinical guidelines and standards. Personal data is handled following applicable data privacy laws. +Issue date 10/06/2025 +FP-NLS-NHS-00-002/5 Revision 12 Page 4/6 +Q A55URANCR +Health +lac-MRA +National Healthcare Systems Co., Ltd DMSC +2301/2 New Petchburi Road,Soi 47(Soonvijai) ISO15189 +Bangkapi, Huaykwang, Bangkok 10310, Thailand +Tel.(662)762-4000 Fax.- Accreditation No.4120/47 +Email:nhcslab@nhealth-asia.com +LABORATORY REPORT +Patient Name: MR. SHUNHU YU +Sex: Male DOB : 03 Jun 1968 Age : 57Y6M17D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor: n 46-25 +MRN.: 10A-25-213418 +Lab No. : 10253540676 Requested Date : 20 Dec 2025 18:36 +Collected Date/Time: 20 Dec 2025 +Received Date/Time : 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +(*)Testosterone +Specimen......................: Serum +Testosterone .................: 4.77 ng/mL +Testosterone..................: 477.00 ng/dL +Reference Range...............: Male 21-49 years : 240-871 ng/dL +Male >= 50 years : 221-716 ng/dL +Female 21-49 years : 14-53 ng/dL +Female >=50 years : 12-36 ng/dL +(*)Dehydroepiandrosterone Sulphate(DHEAS)(Alinity) +DHEA-Sulphate.................: 580.8H ug/dL (48.6-361.8) +(*)Prolactin +Specimen......................: Serum +Prolactin ....................: 5.03 ng/mL (3.46-19.40) +(*)Cortisol(Blood) +Cortisol......................: 9.000 ug/dL +Serum AM : 3.7 - 19.4 ug/dL +Serum PM : 2.9 - 17.3 ug/dL +Remark:(H) means higher than reference values;(L) means lower than reference values;(*)ISO 15189 Accreditied +Reported by : Nurhuda Suetoh,MT21824 on 20 Dec 2025 20:45 +Authorised by: Nurhuda Suetoh,MT21824 on 20 Dec 2025 20:46 Print Date and Time: 20 Dec 2025 20:46 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen(S) received on the above date: Copyright issued by N Health."DO NOT COPY" This report presents test results for informational purposes and is not a substitute for medical advice, ignosis, or treatment. Variabilit in specimen quality, est kit perfomance, reagen ntegrity, and the +sensitivity and specificity of nstruments may influence the findings. While every effort is made to ensure te acuracy and timeliness of the infomation provided, results should be considered in the +context of evolving clinical guidelines and standards. Personal data is handled following applicable data privacy laws. +Issue date 10/06/2025 +FP-NLS-NHS-00-002/5 Revision 12 Page 5/6 +Health Q +lac-MRA +National Healthcare Systems Co., Ltd DMSC +2301/2 New Petchburi Road, Soi 47(Soonvijai) +Bangkapi, Huaykwang, Bangkok 10310, Thailand ISO 15189 +Tel.(662)762-4000 Fax.- Accreditation No.4120/47 +Email:nhcslab@nhealth-asia.com +LABORATORY REPORT +Patient Name : MR. SHUNHU YU +Sex : Male DOB : 03 Jun 1968 Age : 57Y6M17D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor: 46-25 +MRN.:10A-25-213418 +Lab No.:10253540676 Requested Date : 20 Dec 2025 18:36 +Collected Date/Time : 20 Dec 2025 +Received Date/Time : 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +(*)Thyroglobulin Antibody +Thyroglobulin Antibody.....: 0.73 IU/ml (0.00-4.11) +Remark:(H) means higher than reference values;(L) means lower than reference values;(*)ISO 15189 Accreditied +Reported by: (Auto)Warunee Jit.,MT7166 on 20 Dec 2025 20:43 +Authorised by: (Auto) Warunee Jit.,MT7166 on 20 Dec 2025 20:44 Print Date and Time : 20 Dec 2025 20:46 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen (S) received on the above date: Copyright issued by N Health."DO NOT COPY" +This repor presents testresults forinformational purposes and is not a substitute for medical advice, diagnosis, r treatment. Variability in specimen quality, tet it prformance, reagent integrity, and the +sensitivity and specificity of instruments may influence thefindings. While every fort is made to ensure the accuracy and timeliness of the infomation provided,results should be considered in the +context of evolving clinical guidelines and standards. Personal data is handled following applicable data privacy laws. +Issue date 10/06/2025 +FP-NLS-NHS-00-002/5 Revision 12 Page 6/6 +N Q 65SURANC +Health +lac-MRA +National Healthcare Systems Co., Ltd DMSC +2301/2 New Petchburi Road, Soi 47(Soonvijai) +Bangkapi,Huaykwang,Bangkok 10310,Thailand ISO 15189 +Tel.(662)762-4000 Fax.- Accreditation No.4120/47 +Email:nhcslab@nhealth-asia.com +LABORATORY REPORT +Patient Name : MR. SHUNHU YU +Sex : Male DOB : 03 Jun 1968 Age :57Y6M17D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor:n d 46-25 +MRN.:10A-25-213418 +Lab No.:10253540676 Requested Date : 20 Dec 2025 18:36 +Collected Date/Time : 20 Dec 2025 +Received Date/Time : 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +(*)IgE(Immunoglobulin E) Level +Immunoglobulin E(IgE)......: 34.70 IU/mL (<100.00) +Method........................: Electrochemiluminescence immunoassay +Remark:(H)means higher than reference values;(L) means lower than reference values;(*)ISO 15189 Accreditied +Reported by: Nurhuda Suetoh,MT21824 on 20 Dec 2025 20:48 +Authorised by : Nurhuda Suetoh,MT21824 on 20 Dec 2025 20:48 Print Date and Time: 20 Dec 2025 20:48 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen (S) received on the above date: Copyright issued by N Health."DO NOT COPY" This report presents test results for informational purposes and is not a substitute for medical advice, diagnosis,or treatment. Variability in specimen quality, est it performane, reagent integrity, and the +sensitivity and specificity of instruments may influence the findings. While every ffort is made to ensure the accuracy and timeliness of the information provided, esults should be considered in the +context of evolving clinical guidelines and standards. Personal data is handled following applicable data privacy laws. +Issue date 10/06/2025 +FP-NLS-NHS-00-002/5 Revision 12 Page 1/1 +Q S5URANC +Health +lac-MRA +National Healthcare Systems Co., Ltd DMSC +2301/2 New Petchburi Road, Soi 47 (Soonvijai) +Bangkapi, Huaykwang,Bangkok 10310, Thailand ISO 15189 +Tel.(662)762-4000 Fax.- Accreditation No.4120/47 +Email:nhcslab@nhealth-asia.com +LABORATORY REPORT +Patient Name : MR. SHUNHU YU +Sex : Male DOB : 03 Jun 1968 Age : 57Y6M17D Address/Ref.No.: Be.U wellness center(Cash)(C1W)/CASH02/HN.000 +Doctor: 46-25 +MRN.:10A-25-213418 +Lab No.:10253540676 Requested Date : 20 Dec 2025 18:36 +Collected Date/Time : 20 Dec 2025 +Received Date/Time : 20 Dec 2025 18:53 +Test Name Result Unit Reference Range +(*)IgM( Immunoglobulin M) Level +Immunoglobulin M(IgM).......: 35.10 Lmg/dL (40.00-230.00) +Remark:(H)means higher than reference values;(L) means lower than reference values;(*)ISO 15189 Accreditied +Reported by: Nurhuda Suetoh,MT21824 on 20 Dec 2025 20:48 +Authorised by: Nurhuda Suetoh,MT21824 on 20 Dec 2025 20:49 Print Date and Time : 20 Dec 2025 20:50 +This report has been approved electronically. Information contained in this document is CONFIDENTIAL. Copyright: Issued by N Health. +This report is only for the specimen(S) received on the above date: Copyright issued by N Health."DO NOT COPY" +This report presents tes results fr informational purposes and is not a substitute for medical advice, diagnosis, r treatment. Variability in specimen quality, tet kit prformance, reagent integrity, and the +sensitivity and specificity of instruments may influence the findings. While every efrtis made to ensure the accuracy and timeliness of the infomation provided, results should be considered in the context of evolving clinical guidelines and standards. Personal data is handled following applicable data privacy laws. +Issue date 10/06/2025 +FP-NLS-NHS-00-002/5 Revision 12 Page 1/1 \ No newline at end of file diff --git a/backend/parse_medical_v2.py b/backend/parse_medical_v2.py new file mode 100644 index 0000000..08e7c07 --- /dev/null +++ b/backend/parse_medical_v2.py @@ -0,0 +1,1143 @@ +""" +优化版医疗数据解析模块 - 处理多种OCR格式(英文+中文) +""" +import re + + +# ============================================================ +# 中文体检报告解析(格式: 检查名称 检查结果 参考值 单位) +# ============================================================ + +# 中文项目名 → ABB 映射(按长度降序排列避免短匹配覆盖长匹配) +CN_NAME_TO_ABB = { + # 尿液分析 + '颜色': 'Color', '透明度': 'Clarity', '比重': 'SG', '酸碱度': 'pH', + '蛋白质': 'PRO', '葡萄糖': 'GLU', '酮体': 'KET', '胆红素': 'BIL', + '尿胆原': 'URO', '亚硝酸盐': 'NIT', '白细胞酯酶': 'LEU', '隐血': 'BLD', + # 血型 + 'ABO血型': 'ABO', 'Rh(D)血型': 'Rh(D)', + # 血常规 - 长名称优先 + '中性粒细胞百分率(NEUT%)': 'NEUT%', '中性粒细胞数(NEUT#)': 'NEUT', + '淋巴细胞百分率(LYMPH%)': 'LYMPH%', '淋巴细胞数(LYMPH#)': 'LYMPH', + '单核细胞百分率(MONO%)': 'MONO%', '单核细胞数(MONO#)': 'MONO', + '嗜酸性粒细胞百分率(EO%)': 'EOS%', '嗜酸性粒细胞数(EO#)': 'EOS', + '嗜碱性粒细胞百分率(BASO%)': 'BAS%', '嗜碱性粒细胞数(BASO#)': 'BAS', + '白细胞计数(WBC)': 'WBC', '白细胞计数': 'WBC', + '红细胞计数(RBC)': 'RBC', '红细胞计数': 'RBC', + '血红蛋白量(HGB)': 'Hb', '血红蛋白量': 'Hb', '血红蛋白': 'Hb', + '红细胞比积(HCT)': 'HCT', '红细胞比积': 'HCT', '红细胞压积': 'HCT', + '平均红细胞体积(MCV)': 'MCV', '平均红细胞体积': 'MCV', + '平均红细胞血红蛋白量(MCH)': 'MCH', '平均红细胞血红蛋白量': 'MCH', + '平均红细胞血红蛋白浓度(MCHC)': 'MCHC', '平均红细胞血红蛋白浓度': 'MCHC', + '红细胞分布宽度-标准差(RDW-SD)': 'RDW-SD', '红细胞分布宽度-变异系数(RDW-CV)': 'RDW', + '血小板计数(PLT)': 'PLT', '血小板计数': 'PLT', + '血小板比积(PCT)': 'PCT', '平均血小板体积(MPV)': 'MPV', + '血小板分布宽度(PDW)': 'PDW', '大型血小板比率(P-LCR)': 'P-LCR', + # 肝功能 + '总胆红素': 'TBil', '直接胆红素': 'DBil', '间接胆红素': 'IBil', + '总蛋白': 'TP', '白蛋白': 'ALB', '球蛋白': 'GLB', '白球比值': 'A/G', + '谷丙转氨酶': 'ALT', '谷草转氨酶': 'AST', + 'γ-谷氨酰基转移酶': 'GGT', 'γ-谷氨酰转移酶': 'GGT', + '碱性磷酸酶': 'ALP', + '乳酸脱氢酶': 'LDH', '转铁蛋白': 'Tf', + '胆碱酯酶': 'CHE', + # 肾功能 + '尿素': 'BUN', '肌酐': 'Scr', '尿酸': 'UA', + '胱抑素C': 'CysC', '血清β2微球蛋白': 'β2-MG', + # 血脂 + '甘油三酯': 'TG', '总胆固醇': 'TC', + '高密度脂蛋白胆固醇': 'HDL', '低密度脂蛋白胆固醇': 'LDL', + '游离脂肪酸': 'FFA', '脂蛋白(a)': 'Lp(a)', + # 血糖 + '葡萄糖(空腹)': 'FBS', '胰岛素(空腹)': 'INS', + '糖化血红蛋白': 'HbA1C', + # 心肌酶 + '肌酸激酶同工酶MB': 'CK-MB', '肌酸激酶': 'CK', + # 心血管风险因子 + '超敏C反应蛋白': 'hs-CRP', '同型半胱氨酸': 'Hcy', + # 甲状腺 + '三碘甲状腺原氨酸T3': 'T3', '甲状腺素T4': 'T4', + '游离三碘甲状腺原氨酸FT3': 'FT3', '游离甲状腺素FT4': 'FT4', + '促甲状腺素TSH': 'TSH', '甲状腺球蛋白': 'Tg', + '抗甲状腺球蛋白抗体': 'TgAb', '抗甲状腺过氧化物酶抗体': 'TPO-Ab', + # 胃功能 + '胃蛋白酶原I': 'PGI', '胃蛋白酶原Ⅱ': 'PGII', '胃蛋白酶原比值': 'PGR', + '胃泌素-17': 'G-17', + # 传染病 + '乙肝表面抗原': 'HBsAg', '乙肝表面抗体': 'HBsAb', + '乙肝e抗原': 'HBeAg', '乙肝e抗体': 'HBeAb', '乙肝核心抗体': 'HBcAb', + # 风湿/免疫 + 'C反应蛋白': 'CRP', '抗链球菌溶血素"0"': 'ASO', '抗链球菌溶血素': 'ASO', + '抗核抗体': 'ANA', '类风湿因子': 'RF', + # 电解质 + '钾': 'K', '钠': 'Na', '氯': 'Cl', '总钙': 'Ca', '磷': 'P', + # 骨代谢 + '甲状旁腺素': 'PTH', '骨钙素': 'OST', + # 贫血/维生素 + '维生素B12': 'VitB12', '血清铁蛋白': 'Fer', + '维生素B9(叶酸)血药浓度测定': 'Folate', '叶酸': 'Folate', + '25-羟基维生素D血药浓度测定': '25-OH-VD2+D3', '25-羟基维生素D': '25-OH-VD2+D3', + '25-羟基维生素D3血药浓度测定': 'VD3', '25-羟基维生素D2血药浓度测定': 'VD2', + '维生素A血药浓度测定': 'VitA', '维生素E血药浓度测定': 'VitE', + '维生素K1血药浓度测定': 'VitK1', + '维生素B1血药浓度测定': 'VitB1', '维生素B2血药浓度测定': 'VitB2', + '维生素B3血药浓度测定': 'VitB3', '维生素B5血药浓度测定': 'VitB5', + '维生素B6血药浓度测定': 'VitB6', + # 肿瘤标志物 + '甲胎蛋白': 'AFP', '癌胚抗原': 'CEA', + '糖类抗原19-9': 'CA19-9', '糖类抗原72-4': 'CA72-4', + '糖类抗原24-2': 'CA24-2', '糖类抗原50': 'CA50', + '糖类抗原125': 'CA125', + '神经元特异性烯醇化酶': 'NSE', '细胞角蛋白19片段': 'CYFRA21-1', + '鳞状细胞癌相关抗原': 'SCC', + '胃泌素释放肽前体': 'ProGRP', + '总前列腺特异抗原': 'TPSA', '游离前列腺特异抗原': 'FPSA', + '游离PSA/总PSA': 'F/TPSA', + # 碳13呼气试验 + '碳13尿素呼气试验DOB值': 'C13-DOB', +} +# 按key长度降序排列,确保长名称优先匹配 +CN_SORTED_KEYS = sorted(CN_NAME_TO_ABB.keys(), key=len, reverse=True) + +# 中文报告中应跳过的行关键词 +CN_SKIP_PATTERNS = [ + '检查名称', '检查结果', '参考值', '单位', # 表头 + '打印日期', '健康管理体检报告', '身份证', # 页眉 + '科室小结', '医生建议', '检查医师', '检查日期', # 非数据行 + '既往病史', '体检单号', + '检查所见', '检查提示', '检查结论', # 影像描述 + '近3次体检', '最高值', '最低值', '结果', # 趋势图 + '健康服务中心', 'EPIQTC', 'DlanYo', 'S5-1', 'HRes', # 超声设备信息 + '彩色血流', 'MI1.2', 'MI 0.7', 'Generi', 'A/B Ratio', +] + + +def _is_cn_report(lines: list) -> bool: + """检测是否是中文体检报告格式""" + cn_markers = 0 + for line in lines[:50]: # 只检查前50行 + if '健康管理体检报告' in line: + return True + if '检查名称' in line and '检查结果' in line: + return True + if re.match(r'^[一二三四五六七八九十]+.*检查$', line): + cn_markers += 1 + return cn_markers >= 2 + + +def _parse_cn_data_line(line: str, source_file: str) -> dict: + """ + 解析中文体检报告的数据行 + 格式: 检查名称 检查结果 参考值 单位 + 例如: 白细胞计数(WBC) 5.1 3.5-9.5 x10^9/L + 甲胎蛋白 0.5 <=7.0 ng/ml + 维生素B1血药浓度测定 ↓ 1.67 2.4-9.02 ng/ml + """ + # 先查找ABB + abb = None + project = None + + for cn_key in CN_SORTED_KEYS: + if cn_key in line: + abb = CN_NAME_TO_ABB[cn_key] + project = cn_key + # 取项目名之后的部分 + rest = line[line.index(cn_key) + len(cn_key):].strip() + break + + if not abb or not rest: + return None + + # 如果行内还有括号内的英文ABB如 (WBC),去掉 + rest = re.sub(r'^\s*\([A-Za-z0-9%#]+\)', '', rest).strip() + + # 处理异常标记 ↓ ↑ * 在结果前面 + point = '' + if rest.startswith('↓'): + point = '↓' + rest = rest[1:].strip() + elif rest.startswith('↑'): + point = '↑' + rest = rest[1:].strip() + elif rest.startswith('*'): + rest = rest[1:].strip() + + # 尝试解析: 数值 参考范围 单位 + # 模式1: 数值 参考范围 单位 (如 "5.1 3.5-9.5 x10^9/L") + # 模式2: 数值 <=参考值 单位 (如 "0.5 <=7.0 ng/ml") + # 模式3: 定性结果 (如 "阴性", "阳性", "正常", "未检出") + # 模式4: 定性结果 定性参考 (如 "阴性 阴性") + + # 定性结果 + qualitative = ['阴性', '阳性', '弱阳性', '正常', '未检出', '未提示', + '深黄色', '浅黄色', '黄色', '清亮', '混浊', + 'A型', 'B型', 'AB型', 'O型', + '拒检指检', '无', '肥胖'] + for q in qualitative: + if rest.startswith(q): + result = q + ref_rest = rest[len(q):].strip() + reference = ref_rest.split()[0] if ref_rest.split() else '' + return { + 'abb': abb, 'project': project, 'result': result, + 'point': point, 'unit': '', 'reference': reference, + 'source': source_file + } + + # 数值型结果解析 + # 匹配: 数值 [参考范围] [单位] + # 数值可能带 < > 前缀,如 "<2.00" + # 参考范围格式: "3.5-9.5", "<=7.0", ">=30", "<1.0", ">1.0", "无参考范围" + parts = rest.split() + if not parts: + return None + + # 第一个token应该是数值结果 + result_str = parts[0] + # 验证是数值(可带<>前缀) + if not re.match(r'^[<>]?[\d\.]+$', result_str) and result_str not in qualitative: + return None + + result = result_str + reference = '' + unit = '' + + # 解析剩余部分 + remaining = parts[1:] + if remaining: + # 检查是否是参考范围 (含数字或<=/>=/无参考范围) + ref_part = remaining[0] + if re.match(r'^[<>=\d\.\-]+', ref_part) or '参考范围' in ref_part: + reference = ref_part + if len(remaining) > 1: + unit = remaining[1] + else: + # 可能直接是单位 + unit = ref_part + + # 检测异常标记(如果结果超出参考范围但没有↑↓标记) + if not point and reference and result: + try: + val = float(result.replace('<', '').replace('>', '')) + ref_match = re.match(r'^([\d\.]+)-([\d\.]+)$', reference) + if ref_match: + low = float(ref_match.group(1)) + high = float(ref_match.group(2)) + if val < low: + point = '↓' + elif val > high: + point = '↑' + elif reference.startswith('<='): + threshold = float(reference[2:]) + if val > threshold: + point = '↑' + elif reference.startswith('>='): + threshold = float(reference[2:]) + if val < threshold: + point = '↓' + except (ValueError, TypeError): + pass + + return { + 'abb': abb, 'project': project, 'result': result, + 'point': point, 'unit': unit, 'reference': reference, + 'source': source_file + } + + +def parse_chinese_medical_data(text: str, source_file: str) -> list: + """解析中文健康管理体检报告""" + items = [] + lines = [l.strip() for l in text.split('\n') if l.strip()] + + for line in lines: + # 跳过无关行 + if any(p in line for p in CN_SKIP_PATTERNS): + continue + + # 跳过页眉(姓名行) + if re.match(r'^姓名', line): + continue + + # 跳过科室标题行(如 "十、血常规检查") + if re.match(r'^[一二三四五六七八九十百]+[、.]', line): + continue + + # 跳过纯文字描述行(无数字的行通常不是数据行) + if not re.search(r'\d', line): + continue + + # 跳过趋势图中的年份行 + if re.match(r'^\d{4}-\d{2}-\d{2}$', line): + continue + + # 跳过超声图片注释行 + if re.match(r'^\d+:\d+', line) or 'cm' == line.strip(): + continue + + # 尝试解析为数据行 + item = _parse_cn_data_line(line, source_file) + if item: + items.append(item) + + return items + + +def parse_medical_data_v2(text: str, source_file: str) -> list: + """从OCR文本中解析医疗检测数据 - 优化版,支持英文+中文格式""" + lines = [l.strip() for l in text.split('\n') if l.strip()] + + # 自动检测报告语言/格式 + if _is_cn_report(lines): + print(" [检测] 中文体检报告格式,使用中文解析器") + return parse_chinese_medical_data(text, source_file) + + # 以下是原有的英文报告解析逻辑 + items = [] + + # 项目名称到ABB的映射 + name_to_abb = { + # 血常规 - 添加更多变体 + 'mean cell hb concentration': 'MCHC', 'mchc': 'MCHC', + 'mean corpuscular hemoglobin concentration': 'MCHC', + 'mean corpuscular hemoglobin': 'MCH', 'mean cell hemoglobin': 'MCH', + 'rbc distribution width': 'RDW', 'rdw': 'RDW', + 'red cell distribution width': 'RDW', + 'total wbc': 'WBC', 'white blood cell': 'WBC', 'wbc': 'WBC', 'white blood cells': 'WBC', + 'red blood cell': 'RBC', 'rbc count': 'RBC', 'total rbc': 'RBC', 'red blood cells': 'RBC', + 'hemoglobin(hb)': 'Hb', 'hemoglobin': 'Hb', + 'hematocrit(hct)': 'HCT', 'hematocrit': 'HCT', 'hct': 'HCT', + 'mean cell volume': 'MCV', 'mcv': 'MCV', 'mean corpuscular volume': 'MCV', + 'platelet count': 'PLT', 'platelet': 'PLT', 'plt': 'PLT', 'platelets': 'PLT', + 'mean platelet volume': 'MPV', 'mpv': 'MPV', + 'neutrophil': 'NEUT', 'neut': 'NEUT', 'neutrophils': 'NEUT', + 'lymphocyte': 'LYMPH', 'lymph': 'LYMPH', 'lymphocytes': 'LYMPH', + 'monocyte': 'MONO', 'mono': 'MONO', 'monocytes': 'MONO', + 'eosinophil': 'EOS', 'eos': 'EOS', 'eosinophils': 'EOS', + 'basophil': 'BAS', 'bas': 'BAS', 'basophils': 'BAS', + 'esr': 'ESR', 'erythrocyte sedimentation': 'ESR', 'esr 1 hour': 'ESR', + 'esr 1 hour': 'ESR', # 重复确保匹配 + + # 血糖 - 使用标准ABB: FBS, HbA1C + 'glucose(fasting)': 'FBS', 'fasting glucose': 'FBS', 'glucose': 'GLU', + 'fasting blood sugar': 'FBS', 'fbs': 'FBS', + 'hba1c': 'HbA1C', 'glycated hemoglobin': 'HbA1C', 'haemoglobin a1c': 'HbA1C', + 'haemoglobin alc': 'HbA1C', 'hemoglobin a1c': 'HbA1C', + 'estimated average glucose': 'EAG', + + # 血脂 + 'hdl-cholesterol': 'HDL', 'hdl cholesterol': 'HDL', 'hdl': 'HDL', + 'ldl-cholesterol': 'LDL', 'ldl cholesterol': 'LDL', 'ldl direct': 'LDL', + 'ldl-cholesterol(direct)': 'LDL', + 'vldl-cholesterol': 'VLDL', 'vldl': 'VLDL', + 'total cholesterol': 'TC', 'cholesterol': 'TC', + 'triglyceride': 'TG', 'tg': 'TG', + 'cholesterol/hdl-c ratio': 'TC/HDL', 'cholesterol/hdl ratio': 'TC/HDL', + 'ldl/hdl ratio': 'LDL/HDL', + 'lipoprotein(a)': 'Lp(a)', 'lipoprotein a': 'Lp(a)', + 'apolipoprotein a1': 'ApoA1', 'apolipoprotein a': 'ApoA1', + 'apolipoprotein b': 'ApoB', + + # 肝功能 - 注意:ast/alt需要精确匹配,避免误匹配 + 'alt(alanine transaminase)': 'ALT', 'alanine aminotransferase': 'ALT', 'sgpt': 'ALT', + 'ast(aspartate transaminase)': 'AST', 'aspartate aminotransferase': 'AST', 'sgot': 'AST', + 'gamma glutamyl transferase': 'GGT', 'gamma gt': 'GGT', 'ggt': 'GGT', + 'ggt( gamma gt)': 'GGT', + 'alp': 'ALP', 'alkaline phosphatase': 'ALP', 'alp(alkaline phosphatase)': 'ALP', + 'total bilirubin': 'TBil', 'bilirubin(total)': 'TBil', + 'direct bilirubin': 'DBil', 'bilirubin(direct)': 'DBil', 'bilirubin (direct)': 'DBil', + 'ldh': 'LDH', 'lactate dehydrogenase': 'LDH', 'ldh(lactate dehydrogenase)': 'LDH', + 'total protein': 'TP', + 'albumin': 'ALB', 'albumir': 'ALB', # OCR可能识别错误 + 'globulin': 'GLB', + + # 肾功能 + 'bun': 'BUN', 'urea nitrogen': 'BUN', 'blood urea nitrogen': 'BUN', + 'creatinine': 'Scr', + 'uric acid': 'UA', + 'egfr': 'eGFR', 'egfr for thai': 'eGFR', + + # 电解质 + 'sodium': 'Na', + 'potassium': 'K', + 'chloride': 'Cl', + 'tco2': 'TCO2', + 'anion gap': 'AG', + 'calcium': 'Ca', + 'phosphorus': 'P', 'phosphate': 'P', 'inorganic phosphate': 'P', + 'magnesium': 'Mg', 'magnesium(mg)': 'Mg', + + # 凝血功能 - 注意:partial thromboplastin 要在前面,避免被ast匹配 + 'partial thromboplastin time': 'APTT', 'activated partial thromboplastin': 'APTT', + 'prothrombin time': 'PT', 'prothrombin time(pt)': 'PT', + 'thrombin time': 'TT', 'thrombin time(tt)': 'TT', + 'fibrinogen': 'FIB', 'fibrinogen level': 'FIB', + 'd-dimer': 'D-Dimer', 'fdp d-dimer': 'D-Dimer', + 'aptt': 'APTT', + 'inr': 'INR', + + # 甲状腺 + 'tsh': 'TSH', 'thyroid stimulating': 'TSH', + 'free t3': 'FT3', 'free t3(free triiodothyronine)': 'FT3', + 'free t4': 'FT4', 'free t4 (free thyroxine)': 'FT4', + 'total t3': 'T3', 'total t3(triiodothyronine)': 'T3', + 'total t4': 'T4', 'totalt4 (thyroxine)': 'T4', + + # 性激素 - 使用标准ABB: T, COR, DHEAS + 'estradiol(e2)': 'E2', 'estradiol': 'E2', 'estrogen': 'E2', + 'progesterone': 'PROG', + 'testosterone': 'T', # 标准ABB是T + 'fsh': 'FSH', 'follicle stimulating': 'FSH', 'folicle stimulating hormone': 'FSH', + 'folicle stimulating hormone(fsh)': 'FSH', # OCR可能拼写错误 + 'lh(luteinizing hormone)': 'LH', 'lh': 'LH', 'luteinizing hormone': 'LH', + 'prolactin': 'PRL', + 'cortisol': 'COR', # 标准ABB是COR + 'dhea-sulphate': 'DHEAS', 'dhea': 'DHEA', 'dhea-s': 'DHEAS', # 标准ABB是DHEAS + 'igf-1': 'IGF-1', 'igf1': 'IGF-1', + 'calcitonin': 'CT', # 标准ABB是CT + + # 肿瘤标志物 - 使用标准ABB: CA15-3, CA19-9, TPSA + 'afp': 'AFP', 'alpha fetoprotein': 'AFP', 'afp(alpha fetoprotein)': 'AFP', + 'cea': 'CEA', 'carcinoembryonic': 'CEA', 'cea(carcinoembryonic antigen)': 'CEA', + 'ca125': 'CA125', 'ca 125': 'CA125', 'cancer antigen 125': 'CA125', + 'ca153': 'CA15-3', 'ca 15-3': 'CA15-3', 'carbohydrate antigen 15-3': 'CA15-3', 'cancer antigen 15-3': 'CA15-3', # 标准ABB + 'ca199': 'CA19-9', 'ca 19-9': 'CA19-9', 'carbohydrate antigen 19-9': 'CA19-9', # 标准ABB + 'psa': 'TPSA', 'total psa': 'TPSA', 'prostate specific antigen': 'TPSA', # 标准ABB是TPSA + 'free psa': 'FPSA', 'fpsa': 'FPSA', + 'nse': 'NSE', 'neuron specific enolase': 'NSE', + 'cyfra 21-1': 'CYFRA21-1', 'cyfra 21-1(nonsmall cell lung)': 'CYFRA21-1', + 'thyroglobulin': 'Tg', 'tg': 'Tg', # 甲状腺球蛋白 + + # 炎症指标 + 'c-reactive protein(high sens)': 'hs-CRP', 'hs-crp': 'hs-CRP', + 'c-reactive protein high sens': 'hs-CRP', # 无括号版本 + 'crp': 'CRP', 'c-reactive protein': 'CRP', + 'rf': 'RF', 'rheumatoid factor': 'RF', + 'anti streptolysin o titre(aso)': 'ASO', 'anti streptolysin o titre': 'ASO', + 'aso': 'ASO', 'anti-streptolysin': 'ASO', + + # 免疫球蛋白 + 'immunoglobulin g(igg)': 'IgG', 'immunoglobulin g': 'IgG', 'igg': 'IgG', + 'immunoglobulin a(iga)': 'IgA', 'immunoglobulin a': 'IgA', 'iga': 'IgA', + 'immunoglobulin m(igm)': 'IgM', 'immunoglobulin m': 'IgM', 'igm': 'IgM', + 'immunoglobulin e(ige)': 'IgE', 'immunoglobulin e': 'IgE', 'ige': 'IgE', + 'complement c3(b1c)': 'C3', 'complement c3': 'C3', 'c3': 'C3', + 'complement c4': 'C4', 'c4': 'C4', 'complement c4': 'C4', + + # 淋巴细胞亚群 + 'cd3+': 'CD3+', 'cd3': 'CD3+', 't lymphocyte': 'CD3+', 't-lymphocyte': 'CD3+', + 'cd3+ t lymphocyte': 'CD3+', 'cd3+t': 'CD3+', + 'cd4+': 'CD4+', 'cd4': 'CD4+', 'helper t cell': 'CD4+', 'cd4+ t helper': 'CD4+', + 'cd4+t': 'CD4+', 'cd4+ helper': 'CD4+', + 'cd8+': 'CD8+', 'cd8': 'CD8+', 'cytotoxic t cell': 'CD8+', 'cd8+ t cytotoxic': 'CD8+', + 'cd8+t': 'CD8+', 'suppressor t cell': 'CD8+', + 'cd4/cd8': 'CD4/CD8', 'cd4/cd8 ratio': 'CD4/CD8', 'cd4:cd8': 'CD4/CD8', + 'nk cell': 'NK', 'nk cells': 'NK', 'natural killer': 'NK', 'cd16+cd56+': 'NK', + 'cd16/cd56': 'NK', 'nk': 'NK', '% nk cell': 'NK', 'flowcytometry for nk cell': 'NK', + 'b lymphocyte': 'B-Lymph', 'b-lymphocyte': 'B-Lymph', 'b cell': 'B-Lymph', + 'cd19+': 'B-Lymph', 'cd19': 'B-Lymph', + 't lymphocyte count': 'T-Lymph', 't-lymphocyte count': 'T-Lymph', + + # 自身抗体 + 'ana': 'ANA', 'antinuclear antibody': 'ANA', + 'thyroglobulin antibody': 'TgAb', + + # 传染病 - 使用标准ABB: HCV + 'hbsag(hepatitis b surface antigen)': 'HBsAg', 'hepatitis b surface antigen': 'HBsAg', 'hbsag': 'HBsAg', + 'hbsab(hepatitis b surface antibody)': 'HBsAb', 'hepatitis b surface antibody': 'HBsAb', 'hbsab': 'HBsAb', + 'hbe ag(hepatitis be antigen)': 'HBeAg', 'hepatitis be antigen': 'HBeAg', 'hbeag': 'HBeAg', + 'hbe ab(hepatitis be antibody)': 'HBeAb', 'hepatitis be antibody': 'HBeAb', 'hbeab': 'HBeAb', + 'hbcab(hepatitis b core antibody)': 'HBcAb', 'hepatitis b core antibody': 'HBcAb', 'hbcab': 'HBcAb', + 'hcv ab (hepatitis c antibody)': 'HCV', 'hepatitis c antibody': 'HCV', 'anti-hcv': 'HCV', # 标准ABB是HCV + 'hiv-1/hiv-2 antibody': 'HIV', 'hiv': 'HIV', + 'rpr (rapid plasma reagin)': 'TRUST', 'rapid plasma reagin': 'TRUST', 'rpr': 'TRUST', + 'rpr(rapid plasma reagin)': 'TRUST', # 无空格版本 # 标准ABB是TRUST + 'h.pylori': 'H.pylori', 'helicobacter': 'H.pylori', + + # 血型 - Rh(D)是标准ABB + 'abo group': 'ABO', 'abo blood group': 'ABO', + 'rh group': 'Rh', 'rh blood group': 'Rh', + 'rh(d)': 'Rh(D)', 'rh factor': 'Rh(D)', 'rh-d': 'Rh(D)', + + # 尿检 + 'color': 'Color', 'colour': 'Color', + 'transparency': 'Clarity', + 'specific gravity': 'SG', + 'ph': 'pH', + 'protein': 'PRO', + 'ketone': 'KET', + 'bilirubin': 'BIL', + 'urobilinogen': 'URO', + 'nitrite': 'NIT', + 'leukocyte': 'LEU', 'leucocyte': 'LEU', + 'erythrocyte': 'ERY', + 'squamous epithelial cell': 'SEC', 'squamous epithelial': 'SEC', + 'calcium oxalate crystal': 'CRY', 'calcium oxalate': 'CRY', # 标准ABB是CRY + + # 微量元素/重金属 - 使用标准ABB: Fer, 25-OH-VD2+D3, Hcy + 'iron': 'Fe', 'serum iron': 'Fe', + 'ferritin': 'Fer', # 标准ABB是Fer + 'zinc': 'Zn', + 'copper': 'Cu', + 'vitamin b12': 'VitB12', 'vit b12': 'VitB12', + 'folate': 'Folate', 'folic acid': 'Folate', + 'vitamin d(25-oh vitamin d total)': '25-OH-VD2+D3', 'vitamin d': '25-OH-VD2+D3', '25-oh vitamin d': '25-OH-VD2+D3', # 标准ABB + '25-hydroxyvitamin d': '25-OH-VD2+D3', '25-oh-vitd': '25-OH-VD2+D3', + 'homocysteine': 'Hcy', # 标准ABB是Hcy + 'lead in blood': 'Pb', 'lead': 'Pb', + 'mercury in blood': 'Hg', 'mercury': 'Hg', + 'cadmium in blood': 'Cd', 'cadmium': 'Cd', + 'chromium in blood': 'Cr', 'chromium': 'Cr', + 'manganese in blood': 'Mn', 'manganese': 'Mn', + 'nickel in blood': 'Ni', 'nickel': 'Ni', + + # 心肌酶 + 'ck-mb': 'CK-MB', 'creatine kinase-mb': 'CK-MB', + 'creatine kinase': 'CK', + + # 骨代谢 - 使用标准ABB: OST, TPINP, β-CTX + 'n-mid osteocalcin': 'OST', 'osteocalcin': 'OST', # 标准ABB是OST + 'p1np': 'TPINP', 'total procollagen': 'TPINP', # 标准ABB是TPINP + 'beta crosslap': 'β-CTX', 'ctx': 'β-CTX', 'b-ctx': 'β-CTX', 'beta-crosslaps': 'β-CTX', # 标准ABB是β-CTX + 'pth(intact)': 'PTH', 'pth': 'PTH', 'parathyroid hormone': 'PTH', + } + + # 跳过关键词 + skip_words = [ + 'page ', 'patient name', 'doctor:', 'laboratory', 'specimen', + 'collected date', 'printed', 'bangkok', 'thailand', + 'tel.', 'fax.', 'email:', 'iso 15189', 'iso15189', + 'accreditation', 'lab no', 'mrn', 'requested date', + 'received date', 'address/', 'sex :', 'sex:', 'age :', + 'dob :', 'ref.no', 'copyright', 'reported by', 'authorised by', + 'print date', 'remark:', 'confidential', 'this report', + 'reference range', 'test name', 'result unit', 'edta', + 'morphology', 'adequate', 'differential count', + 'complete blood count', 'issue date', 'revision', 'normal range', + 'for 10-year', 'this equation', 'calculated by', + 'approved by', 'trimester', 'women(', 'female 21', + 'comment:', 'method:', 'method.', 'serum', + 'borderline', 'optimal', 'near optimal', 'very high', + 'low risk', 'average risk', 'high risk', 'aha', 'cdc', + 'national', 'healthcare', 'systems', 'new petchburi', + 'physical examination', 'chemical examination', 'urine sedimentation', + 'result comment', 'repeated result', + 'immunoturbidimetric', 'electrochemiluminescence', + 'macroscopic', 'sensitivity:', 'fta-abs', 'tpha', + 'reactive screening', 'gold standard', 'syphilis', + 'egfr comment', 'ckd - epi', 'kidney foundation', + 'ascvd', 'cardiovascular', 'diabetes mellitus', 'target should', + ] + + # 按key长度排序 + sorted_keys = sorted(name_to_abb.keys(), key=len, reverse=True) + + def find_abb(project_name): + """查找项目对应的ABB""" + pl = project_name.lower().strip() + # 移除点号和冒号 + pl = re.sub(r'[\.:\s]+$', '', pl) + pl = re.sub(r'\.{2,}', '', pl) + + for key in sorted_keys: + if key in pl: + return name_to_abb[key] + + # 生成ABB + words = [w for w in project_name.split() if len(w) > 0 and w[0].isalpha()] + if words: + return ''.join([w[0].upper() for w in words])[:6] + return project_name[:6].upper() + + def parse_value(text): + """解析数值,返回 (result, point, unit)""" + text = text.strip() + + # 特殊处理:跳过开头的单独点号(OCR可能把分隔符识别为点号) + # 如 ". Negative" -> "Negative" + if text.startswith('. ') or text.startswith('。 '): + text = text[2:].strip() + elif text == '.' or text == '。': + return None, '', '' + + # 特殊处理:如果结果只是连续的点号,说明OCR识别错误,返回None + # 如 ".............." 应该被跳过 + if re.match(r'^\.{3,}$', text): + return None, '', '' + + # 特殊处理:如果包含冒号,可能是 "<20.0 pg/mL: Normal" 格式 + if ':' in text: + parts = text.split(':') + text = parts[0].strip() # 只取冒号前的部分 + + # 格式0: "5.95 *10^3/mm3" 或 "4.69 *10^6/mm3" 或 "209 10^3/mm3" + m = re.match(r'^([<>]?[\d\.]+)\s*([HL])?\s*(\*?10\^[\d]+[/a-zA-Z0-9\^]+)', text, re.IGNORECASE) + if m: + result = m.group(1) + point = '' + if m.group(2): + point = '↑' if m.group(2).upper() == 'H' else '↓' + unit = m.group(3) + return result, point, unit + + # 格式0.5: "41.3 号" (OCR识别错误,号应该是%) + m = re.match(r'^([<>]?[\d\.]+)\s*([HL])?\s*号', text, re.IGNORECASE) + if m: + result = m.group(1) + point = '' + if m.group(2): + point = '↑' if m.group(2).upper() == 'H' else '↓' + return result, point, '%' + + # 格式1: "230H" 或 "5.7H%" 或 "140H mg/dL" + m = re.match(r'^([<>]?[\d\.]+)\s*([HL])\s*(%)?(.*)$', text, re.IGNORECASE) + if m: + result = m.group(1) + point = '↑' if m.group(2).upper() == 'H' else '↓' + unit = (m.group(3) or '') + (m.group(4) or '').strip() + return result, point, unit + + # 格式2: "158.00mg/dL" (数值和单位连在一起) + m = re.match(r'^([<>]?[\d\.]+)([a-zA-Z/%][a-zA-Z0-9/%\^\*]*)$', text) + if m: + return m.group(1), '', m.group(2) + + # 格式3: "113.00 H IU/mL" + m = re.match(r'^([<>]?[\d\.]+)\s+([HL])\s+(.+)$', text, re.IGNORECASE) + if m: + point = '↑' if m.group(2).upper() == 'H' else '↓' + return m.group(1), point, m.group(3).strip() + + # 格式4: 纯数值 "95" 或 "18.4" 或 "<20.0" + m = re.match(r'^([<>]?[\d\.]+)$', text) + if m: + return m.group(1), '', '' + + # 格式5: 带单位 "20 H mm/hr" 或 "5.07H mg/L" 或 "<20.0 pg/mL" + m = re.match(r'^([<>]?[\d\.]+)\s*([HL])?\s*([a-zA-Z/%\*].*)$', text, re.IGNORECASE) + if m: + point = '' + if m.group(2): + point = '↑' if m.group(2).upper() == 'H' else '↓' + return m.group(1), point, m.group(3).strip() + + # 格式6: 定性结果 + qualitative = ['positive', 'negative', 'reactive', 'non reactive', 'non-reactive', + 'normal', 'abnormal', 'yellow', 'clear', 'straw', 'amber', + 'a', 'b', 'ab', 'o', 'less than'] + text_lower = text.lower() + for q in qualitative: + if text_lower.startswith(q): + return text.split()[0], '', '' + + # 格式7: 范围结果 "0-1 Cells/HPF" 或 "2-3 Cells/HPF" + m = re.match(r'^(\d+\-\d+)\s*(.*)$', text) + if m: + return m.group(1), '', m.group(2).strip() + + return None, '', '' + + def is_project_line(line): + """判断是否是项目名行""" + # 移除开头的(*) + clean = re.sub(r'^\(\*\)', '', line).strip() + + # 如果移除(*)后只剩下很短的文本,可能只是标题行,不是项目行 + # 如 "(*)ESR" -> "ESR",这种情况下应该继续查找下一行 + if len(clean) < 5 and '...' not in line and ':' not in line and ':' not in line: + return False + + # 包含连续点号的行 + if '...' in line or '...' in line: + return True + + # 以冒号结尾 + if line.endswith(':') or line.endswith(':'): + return True + + # 包含冒号且冒号后有内容(如 "项目名 : 结果") + if ':' in line or ':' in line: + return True + + # 已知项目名 + line_lower = line.lower() + for key in sorted_keys: + if key in line_lower: + return True + + return False + + def extract_project_and_result(line): + """从行中提取项目名和结果(处理多种格式)""" + # 移除开头的(*) + line = re.sub(r'^\(\*\)', '', line).strip() + + # 格式0: "FSH. 5.85 mIU/mL" - 项目名后是单个点号加空格加数值 + # 匹配: 项目名. 数值 单位 + m = re.match(r'^([A-Za-z][A-Za-z0-9\-\(\)]+)\.\s+([<>]?[\d\.]+)\s*([a-zA-Z/%].*)$', line) + if m: + project = m.group(1).strip() + rest = m.group(2) + ' ' + m.group(3) + result, point, unit = parse_value(rest) + if result: + return project, result, point, unit, '' + + # 格式1: "项目名...... 结果 [Normal: xxx]" 或 "项目名...... 结果 (参考范围)" + # 如 "Color........................ Yellow [Normal : Yellow]" + # 或 "pH......... 6.0 (4.5-8.0)" + # 注意:先检查点号分隔,因为有些行包含 [Normal: xxx] 中的冒号 + if '...' in line or '...' in line: + # 用点号分割 + parts = re.split(r'\.{2,}', line, maxsplit=1) + if len(parts) == 2: + project = parts[0].strip() + rest = parts[1].strip() + + # 去掉rest开头的冒号(如 ": 5.07H mg/L") + rest = re.sub(r'^[:\:]\s*', '', rest) + + if rest: + # 先提取 [Normal: xxx] 格式的参考范围 + reference = '' + normal_match = re.search(r'\[Normal\s*[:\:]\s*([^\]]+)\]', rest, re.IGNORECASE) + if normal_match: + reference = f'[Normal: {normal_match.group(1).strip()}]' + rest = rest[:normal_match.start()].strip() + + # 再提取 (xxx) 格式的参考范围 + if not reference: + ref_match = re.search(r'\(([^\)]+)\)\s*$', rest) + if ref_match: + reference = f'({ref_match.group(1)})' + rest = rest[:ref_match.start()].strip() + + # 解析结果和单位 + result, point, unit = parse_value(rest) + + # 特殊处理:如果结果为空(可能是OCR只识别到点号),但有[Normal: xxx]参考范围 + # 对于尿检项目,如果参考范围是Negative,结果也应该是Negative + if not result and reference: + # 从参考范围中提取预期值 + normal_val_match = re.search(r'\[Normal[:\:]\s*([^\]]+)\]', reference, re.IGNORECASE) + if normal_val_match: + expected_val = normal_val_match.group(1).strip() + # 如果预期值是定性结果(如Negative, Yellow等),使用它作为结果 + qualitative_vals = ['negative', 'positive', 'normal', 'yellow', 'clear', 'straw', 'amber', 'not found'] + if expected_val.lower() in qualitative_vals: + result = expected_val + point = '' + unit = '' + + if result: + return project, result, point, unit, reference + + # 格式2: "项目名...: 结果 单位 (参考范围)" 或 "项目名: 结果" + if ':' in line or ':' in line: + # 使用正则分割,支持中英文冒号 + parts = re.split(r'[:\:]', line, maxsplit=1) + if len(parts) == 2: + project = parts[0].strip() + rest = parts[1].strip() + + # 清理项目名中的点号 + project = re.sub(r'\.{2,}', '', project).strip() + + # 解析rest部分 + if rest: + # 先提取 [Normal: xxx] 格式的参考范围 + reference = '' + normal_match = re.search(r'\[Normal\s*[:\:]\s*([^\]]+)\]', rest, re.IGNORECASE) + if normal_match: + reference = f'[Normal: {normal_match.group(1).strip()}]' + rest = rest[:normal_match.start()].strip() + + # 再提取 (xxx) 格式的参考范围 + if not reference: + ref_match = re.search(r'\(([^\)]+)\)\s*$', rest) + if ref_match: + reference = f'({ref_match.group(1)})' + rest = rest[:ref_match.start()].strip() + + # 解析结果和单位 + result, point, unit = parse_value(rest) + + if result: + return project, result, point, unit, reference + + # 格式3: "项目名 结果" 格式(无点号无冒号) + # 如 "INR 0.93" 或 "Color Yellow" + parts = line.split() + if len(parts) >= 2: + potential_project = parts[0] + potential_value = ' '.join(parts[1:]) + + # 检查是否是已知项目 + pl = potential_project.lower() + for key in sorted_keys: + if pl == key or key in pl: + result, point, unit = parse_value(potential_value) + if result: + return potential_project, result, point, unit, '' + break + + # 如果没有冒号也没有点号,返回原始项目名 + project = re.sub(r'\.{2,}', '', line).strip() + project = re.sub(r'[:\:]+\s*$', '', project).strip() + return project, None, '', '', '' + + # 主解析循环 + i = 0 + while i < len(lines): + line = lines[i].strip() + line_lower = line.lower() + + # 跳过无关行 + if any(w in line_lower for w in skip_words): + i += 1 + continue + + # 跳过空行和太短的行 + if len(line) < 2: + i += 1 + continue + + # 跳过纯数字参考范围行 如 "(0-15)" 或 "(<200)" + if re.match(r'^\([<>]?[\d\.\-]+\)$', line): + i += 1 + continue + + # 跳过纯单位行 + if re.match(r'^[a-zA-Z/%\^]+$', line) and len(line) < 10: + i += 1 + continue + + # 检查是否是项目名行 + if is_project_line(line): + # 尝试从同一行提取项目名和结果 + project, result, point, unit, reference = extract_project_and_result(line) + + # 跳过太短或太长的项目名 + if len(project) < 2 or len(project) > 60: + i += 1 + continue + + # 跳过噪音项目 + if project.lower() in ['report by', 'reported by', 'health', 'age', 'high', 'low']: + i += 1 + continue + + abb = find_abb(project) + + # 如果同一行没有结果,查找下一行 + if result is None: + j = i + 1 + while j < len(lines) and j < i + 5: + next_line = lines[j].strip() + next_lower = next_line.lower() + + # 跳过无关行 + if any(w in next_lower for w in skip_words): + j += 1 + continue + + # 如果是新的项目名(包含冒号或点号分隔),停止 + if is_project_line(next_line) and (':' in next_line or '...' in next_line): + break + + # 如果是 "FSH. 5.85 mIU/mL" 格式的行,也停止(让主循环处理) + if re.match(r'^[A-Za-z][A-Za-z0-9\-\(\)]+\.\s+[<>]?[\d\.]+', next_line): + break + + # 参考范围行 + if next_line.startswith('(') and ')' in next_line: + reference = next_line + j += 1 + continue + + # [Normal: xxx] 格式 + if next_line.startswith('['): + j += 1 + continue + + # 尝试解析为结果 + r, p, u = parse_value(next_line) + if r: + result = r + point = p + unit = u + j += 1 + + # 继续查找单位和参考范围 + while j < len(lines) and j < i + 5: + next2 = lines[j].strip() + + # 单位行 + if re.match(r'^[a-zA-Z/%\^][a-zA-Z0-9/%\^\*]+$', next2) and not unit: + unit = next2 + j += 1 + continue + + # 参考范围 + if next2.startswith('(') and ')' in next2: + reference = next2 + j += 1 + continue + + break + break + + j += 1 + + i = j + else: + i += 1 + + # 保存结果 + if result and abb: + # 过滤噪音 + if project.lower() in ['age', 'high', 'low', 'a', 'h', 'l', 'report by']: + continue + if len(project) > 50: + continue + + # 白细胞分类项目特殊处理:根据参考范围判断是数量还是百分比 + wbc_diff_abbs = {'NEUT', 'LYMPH', 'MONO', 'EOS', 'BAS'} + if abb.upper() in wbc_diff_abbs: + is_percentage = False + # 检查单位是否是百分比 + if unit and '%' in unit: + is_percentage = True + # 检查参考范围是否是百分比形式(0-100之间的数值) + elif reference: + ref_match = re.search(r'\(?([\d\.]+)\s*[-–]\s*([\d\.]+)\)?', reference) + if ref_match: + try: + low = float(ref_match.group(1)) + high = float(ref_match.group(2)) + # 如果参考范围在0-100之间,且没有10^3等单位标识,认为是百分比 + if 0 <= low <= 100 and 0 <= high <= 100 and '10^' not in reference and '*10' not in reference: + is_percentage = True + except: + pass + + if is_percentage: + abb = abb.upper() + '%' + if not unit: + unit = '%' + + items.append({ + 'abb': abb, + 'project': project, + 'result': result, + 'point': point, + 'unit': unit, + 'reference': reference, + 'source': source_file + }) + + # 白细胞分类项目特殊处理:检查下一行是否是绝对值数据 + # PDF格式如: + # Neutrophils............... 54.4 % (46.5-75.0) + # 3237 /mm3 (2000-7500) + wbc_diff_names = {'neutrophil', 'lymphocyte', 'monocyte', 'eosinophil', 'basophil', + 'neutrophils', 'lymphocytes', 'monocytes', 'eosinophils', 'basophils'} + if project.lower() in wbc_diff_names or any(n in project.lower() for n in wbc_diff_names): + # 查找下一行的绝对值数据 + # 注意:此时 i 已经指向下一行,所以从 i 开始查找 + next_idx = i + search_limit = min(next_idx + 5, len(lines)) # 最多查找5行 + while next_idx < search_limit: + next_line = lines[next_idx].strip() + # 跳过空行 + if not next_line: + next_idx += 1 + continue + # 如果是新的项目名行,停止 + if is_project_line(next_line): + break + # 检查是否是绝对值数据行(数值 + /mm3 或 10^3/mm3 等单位) + # 格式如:3237 /mm3 (2000-7500) 或 3237 10^3/mm3 (2000-7500) + abs_match = re.match(r'^\s*([<>]?[\d\.]+)\s*([HL])?\s*(/mm3|\*?10\^[\d]+[/a-zA-Z0-9\^]+|[/a-zA-Z0-9\^]+mm3)\s*(\([^\)]+\))?', next_line, re.IGNORECASE) + if abs_match: + abs_result = abs_match.group(1) + abs_point = '' + if abs_match.group(2): + abs_point = '↑' if abs_match.group(2).upper() == 'H' else '↓' + abs_unit = abs_match.group(3) if abs_match.group(3) else '' + abs_reference = abs_match.group(4) if abs_match.group(4) else '' + + # 生成绝对值的ABB(去掉%后缀,或使用原始ABB) + base_abb = abb.replace('%', '').upper() + + items.append({ + 'abb': base_abb, + 'project': project, + 'result': abs_result, + 'point': abs_point, + 'unit': abs_unit, + 'reference': abs_reference, + 'source': source_file + }) + next_idx += 1 + break + next_idx += 1 + continue + + # 检查是否是 "项目名 结果" 格式(无点号无冒号) + # 如 "Color Yellow" 或 "pH 6.0" + parts = line.split() + if len(parts) >= 2: + potential_project = parts[0] + potential_value = ' '.join(parts[1:]) + + # 检查是否是已知项目 + pl = potential_project.lower() + for key in sorted_keys: + if pl == key or pl.startswith(key): + abb = name_to_abb[key] + result, point, unit = parse_value(potential_value) + if result: + items.append({ + 'abb': abb, + 'project': potential_project, + 'result': result, + 'point': point, + 'unit': unit, + 'reference': '', + 'source': source_file + }) + break + + i += 1 + + return items + + +def clean_extracted_data_v2(items: list) -> list: + """清洗提取的数据""" + cleaned = [] + seen = set() # 去重 + + # 噪音ABB列表 + noise_abbs = { + 'A', 'H', 'L', 'R', 'AGE', 'NHY', 'D', 'O', 'RB', 'N', 'Q', 'C', 'J', 'Y', 'FY', 'OEP', + 'F', 'M', 'MY', 'S', 'AC', 'AH', 'AR', 'AS', 'WCC', # 新增噪音 + } + + # 噪音项目名 - 使用单词边界匹配 + noise_projects = [ + 'received', 'collected', 'report by', 'reported by', + 'health', 'name', 'dob', 'oct', 'patient', 'doctor', 'lab no', 'mrn', + 'sex', 'address', 'ref.no', 'requested', 'printed', 'page', + 'female', 'male', 'adult', 'sep ', 'anti-n rnp', 'anti smith', # 新增噪音 + 'absolute count', 'white cell count', # 这些是NK cell的附属数据,不是独立项目 + ] + + # 需要完全匹配的噪音词(避免误过滤如 "High Sens", "Average") + noise_exact = ['high', 'low', 'age'] + + for item in items: + abb = item.get('abb', '').upper() + result = item.get('result', '') + project = item.get('project', '') + project_lower = project.lower() + + # 过滤无效数据 + if not abb or not result: + continue + + # 修复无效结果:如果结果是连续点号,尝试从参考范围中提取 + if re.match(r'^\.{3,}$', result): + reference = item.get('reference', '') + if reference: + # 从参考范围中提取预期值 + normal_val_match = re.search(r'\[Normal[:\:]\s*([^\]]+)\]', reference, re.IGNORECASE) + if normal_val_match: + expected_val = normal_val_match.group(1).strip() + # 如果预期值是定性结果,使用它作为结果 + qualitative_vals = ['negative', 'positive', 'normal', 'yellow', 'clear', 'straw', 'amber', 'not found'] + if expected_val.lower() in qualitative_vals: + result = expected_val + item['result'] = result + else: + continue # 无法修复,跳过 + else: + continue # 无法修复,跳过 + else: + continue # 无参考范围,跳过 + + # 过滤噪音ABB + if abb in noise_abbs: + continue + + # 过滤噪音项目名 + if any(n in project_lower for n in noise_projects): + continue + + # 过滤完全匹配的噪音词 + if project_lower in noise_exact: + continue + + # 过滤太短的项目名(但保留已知的短项目名如pH) + known_short_projects = {'ph', 'k', 'p', 'na', 'cl', 'mg', 'ca', 'fe', 'zn', 'cu', 'pb', 'hg', 'cd', 'cr', 'mn', 'ni'} + if len(project) < 3 and project_lower not in known_short_projects: + continue + + # 去重 - 使用ABB和结果组合 + key = f"{abb}:{result}" + if key in seen: + continue + seen.add(key) + + cleaned.append(item) + + return cleaned + + +if __name__ == '__main__': + # 测试 + test_text = """ + ABO Group.................: + B + Rh Group...................: + Positive + ESR 1 Hour ...................: + 20 H mm/hr + (0-15) + Thrombin Time(TT)............: + 18.4 + Secs. + (15.8-19.0) + Cholesterol...................: + 230H + mg/dL + (<200) + Color........................ + Yellow + [Normal : Yellow] + pH......... + 6.0 + (4.5-8.0) + Immunoglobulin M(IgM)......: + 158.00mg/dL + (40.00-230.00) + Sodium........................: + 141 + mmol/L + (136-145) + Potassium.....................: + 4.77 + mmol/L + (3.50-5.10) + Hemoglobin(Hb) : 13.8 g/dL (13.0-18.0) + Mean Cell Volume : 88.1 fL (80.0-100.0) + HDL-Cholesterol : 41 mg/dL (>40) + LDL Direct : 140H mg/dL (<130) + """ + + items = parse_medical_data_v2(test_text, 'test.pdf') + items = clean_extracted_data_v2(items) + + print(f"提取了 {len(items)} 个项目:") + for item in items: + print(f" {item['abb']}: {item['project'][:30]} = {item['result']} {item['point']} {item['unit']} {item['reference']}") diff --git a/backend/pdf_item_order.json b/backend/pdf_item_order.json new file mode 100644 index 0000000..91d5407 --- /dev/null +++ b/backend/pdf_item_order.json @@ -0,0 +1,274 @@ +{ + "Urine Test": [ + "16", + "17", + "Color", + "pH", + "TuR", + "PRO", + "18", + "BLD/ERY", + "GLU", + "SG", + "WBC", + "19", + "NIT", + "KET" + ], + "Complete Blood Count": [ + "20", + "21", + "Hb", + "HCT", + "22", + "MCV", + "MCH", + "MCHC", + "RDW", + "23", + "RBC", + "Morphology", + "WBC", + "count", + "NEUT", + "NEUT%", + "EOS", + "EOS%", + "24", + "BAS", + "BAS%", + "LYMPH", + "LYMPH%", + "MONO", + "MONO%", + "25", + "PLT", + "PCT", + "MPV", + "PDW", + "26" + ], + "Blood Sugar": [ + "27", + "28", + "FBS", + "HbA1C" + ], + "Lipid Profile": [ + "29", + "30", + "TC", + "TG", + "HDL", + "31", + "LDL" + ], + "Blood Type": [ + "32", + "33", + "Blood", + "type" + ], + "Blood Coagulation": [ + "34", + "35", + "PT", + "APTT", + "TT", + "36", + "FIB" + ], + "Four Infectious Diseases": [ + "37", + "38", + "HIV", + "TRUST", + "TPPA", + "HBsAg", + "39", + "HBsAb", + "HBeAg", + "HBeAb", + "HBcAb", + "40", + "IgM" + ], + "Serum Electrolytes": [ + "41", + "42", + "Kalium", + "Sodium", + "Chloride", + "Calcium", + "43", + "Magnesium", + "Phosphorus" + ], + "Liver Function": [ + "44", + "45", + "TP", + "A/G", + "46", + "TBil", + "DBil", + "IBil", + "ALP", + "47", + "ALT", + "AST", + "GGT" + ], + "Kidney Function": [ + "48", + "49", + "Scr", + "BUN", + "UA" + ], + "Myocardial Enzyme": [ + "50", + "51", + "CK", + "LDH" + ], + "Thyroid Function": [ + "52", + "53", + "T3", + "T4", + "FT3", + "FT4", + "54", + "TSH" + ], + "Thromboembolism": [ + "55", + "Thromboembolism", + "56", + "Homocysteine", + "Dimer" + ], + "Bone Metabolism": [ + "57", + "58", + "25-OH-", + "VD2+D3", + "D2+D3", + "PTH", + "OST", + "TPINP", + "59" + ], + "Microelement": [ + "60", + "Microelement", + "61", + "Pb", + "Cu", + "Zn", + "Mg", + "62", + "Fe", + "63", + "64", + "CD3+", + "CD4+", + "CD8+" + ], + "Lymphocyte Subpopulation": [ + "63", + "64", + "CD3+", + "CD4+", + "CD8+" + ], + "Humoral Immunity": [ + "65", + "66", + "IgG", + "IgA", + "IgM", + "IgE", + "67", + "C3", + "C4" + ], + "Inflammatory Reaction": [ + "68", + "69", + "CRP", + "ESR", + "ASO" + ], + "Autoantibody": [ + "70", + "Autoantibody", + "71", + "ANA", + "RF" + ], + "Female Hormone": [ + "72", + "73", + "E2", + "PROG", + "FSH", + "74", + "LH", + "PRL", + "DHEAS", + "75", + "COR", + "IGF1", + "AMH" + ], + "Male Hormone": [ + "76", + "77", + "DHEAS", + "IGF1", + "PROG", + "78", + "FSH", + "LH", + "PRL", + "Cortisol", + "79", + "E2" + ], + "Tumor Markers": [ + "80", + "81", + "AFP", + "CEA", + "CA19-9", + "Fer", + "82", + "NES", + "Tg", + "CT", + "EA-IgA", + "83", + "TPSA", + "FPSA", + "E/TPSA", + "84", + "CA125", + "CA15-3", + "PSA" + ], + "Imaging": [ + "85", + "Imaging", + "86", + "ECG", + "87", + "Color", + "Doppler", + "Ultrasound", + "88", + "CT", + "Examination" + ] +} \ No newline at end of file diff --git a/backend/prompts/functional_health_advice_prompt.md b/backend/prompts/functional_health_advice_prompt.md new file mode 100644 index 0000000..06d9110 --- /dev/null +++ b/backend/prompts/functional_health_advice_prompt.md @@ -0,0 +1,267 @@ +# 功能医学健康建议生成提示词 + +## 角色设定 +你是Be.U Med功能医学团队的资深健康管理顾问,在功能医学、营养医学、运动医学、睡眠医学及生活方式干预领域具有丰富的临床经验。你擅长从功能医学整体观出发,结合检测数据为客户制定个性化的健康管理方案。 + +## 任务 +根据体检者的血液检查报告异常指标,撰写"功能医学健康建议"方案。该方案位于「医学干预」建议方案之后,是对医学干预的延伸与补充,侧重于日常可执行的健康管理策略。 + +--- + +## 核心原则(必须严格遵守) + +### 1. 段落格式(极其重要!) +- **每个段落必须先写英文,再写对应的中文** +- **英文段落:80-120词,不超过150词** +- **中文段落:80-120字,不超过150字** +- 不要英中混排,必须分开 + +### 2. 禁止输出检测数值 +- **全篇禁止出现任何检验指标的具体数值、参考区间、单位、百分号或数字(0-9)** +- 只能用定性描述:正常/稳定/良好/偏高/偏低/接近上限/接近下限 +- 不要抄写原始报告中的结果值、参考区间、单位 +- **可以提及指标名称**(如"黄体酮偏低""ESR升高"),但不要写具体数值 + +### 3. 语言风格 +- 专业、客观、严谨,体现功能医学视角 +- 使用肯定、明确的表述,如"建议""支持""助力""优化""需要""应当" +- **严禁使用任何不确定表述**,包括但不限于: + - 中文禁用词:可能、也许、或许、大概、似乎、有概率、有可能、可能会、或可、疑似、倾向于、趋向于、不排除、有待、存在...的可能 + - 英文禁用词:may, might, could, possibly, probably, perhaps, likely, potentially, tend to, appear to, seem to, it is possible that +- 禁用"必须""一定""保证""治愈"等绝对化表述 +- 不做临床疾病诊断,聚焦功能状态改善 + +--- + +## 文章结构(必须严格遵循) + +### 总述引导(固定模板+动态填充,共4个段落) + +**段落1(英文,固定模板)**: +"Functional medicine goes beyond the diagnosis and medical treatment of diseases, placing greater emphasis on comprehensive health management for each individual. Beyond the aforementioned "medical intervention", the core of functional medicine lies in helping individuals improve their lifestyle from the root, optimize bodily functions, and enhance overall health. Through a comprehensive assessment of metabolism, immunity, hormones, nutrition, emotions, and daily habits, a personalized health optimization pathway can be tailored for each client." + +**段落2(中文,固定模板)**: +"功能医学不仅仅停留在疾病的诊断与医学治疗,更强调对个体的全方位健康管理。在上述「医学干预」之外,功能医学的核心在于帮助人们从源头改善生活方式、优化身体功能与提升整体健康状态。通过对代谢、免疫、荷尔蒙、营养、情绪及生活习惯等多个维度的综合评估,可以为每一位客户量身定制个性化的健康提升路径。" + +**段落3(英文,固定模板+动态填充)**: +"Based on your test results and individual health status, the Be.U Med Functional Medicine Team provides you with scientific and actionable recommendations in the areas of nutrition adjustment, exercise prescription, sleep and stress management, and lifestyle optimization, aiming to help you achieve long-term health, chronic disease prevention, and overall well-being." + +**段落4(中文,固定模板+动态填充)**: +"基于您的检测结果与个人健康状况,Be.U Med功能医学团队从营养调节、运动处方、睡眠与压力管理、生活方式优化等方面,为您提出科学、可执行的健康建议,旨在帮助您实现真正的长期健康、慢病预防与身心平衡。" + +### 五大模块(固定顺序,每个模块结构相同) + +#### (1) Nutrition Intervention 营养干预 +#### (2) Exercise Intervention 运动干预 +#### (3) Sleep & Stress Management 睡眠与压力管理 +#### (4) Lifestyle Adjustment 生活方式调整 +#### (5) Long-term Follow-up Plan 长期随访计划 + +--- + +### 每个模块的内容结构(必须严格遵循,共包含以下段落) + +**段落A:领域概述(英文,约100词)** +阐述该干预领域在功能医学中的重要性,说明它如何影响代谢、荷尔蒙、免疫等系统。 + +**段落B:领域概述(中文,约120字)** +段落A的中文翻译,内容完全对应。 + +**段落C:检测结果关联分析(英文,约80词)** +结合患者的具体异常指标(只提指标名称和偏高/偏低方向,不写数值),分析为什么需要在该领域进行干预。 + +**段落D:检测结果关联分析(中文,约100字)** +段落C的中文翻译,内容完全对应。 + +**段落E:建议引导语(英文)** +固定为:`Recommended strategies include:` + +**段落F:建议引导语(中文)** +固定为:`建议措施包括:` + +**段落G-N:具体建议(3-5条,每条包含英文+中文)** +- 英文:一句完整的建议,约30-50词 +- 中文:对应的中文翻译,约50-80字 +- 建议要具体、可执行,包含频率、时长、具体食物/运动类型等 +- 每条建议必须与患者的异常指标相关联 +- 中文建议以分号";"结尾(最后一条以句号"。"结尾) + +**段落O:总结意义(英文,约80词)** +总结该领域干预的整体价值,与其他干预的协同作用,以及对功能医学健康管理的意义。 + +**段落P:总结意义(中文,约100字)** +段落O的中文翻译,内容完全对应。 + +--- + +## 各模块内容要点 + +### (1) Nutrition Intervention 营养干预 +- 领域概述:营养对代谢途径、荷尔蒙平衡、细胞修复、免疫稳态的直接影响 +- 检测关联:哪些异常指标与营养摄入不足或不均衡相关 +- 建议方向: + - 微量营养素补充(B族、叶酸、铁、锌、硒、镁等) + - 优质蛋白与健康脂肪摄入 + - 抗炎营养素(Omega-3、维生素C/E、多酚类) + - 益生菌与肠道健康 + - 减少加工食品和促炎食物 + +### (2) Exercise Intervention 运动干预 +- 领域概述:运动对心血管健康、代谢调节、荷尔蒙平衡、心理健康的影响 +- 检测关联:哪些异常指标提示需要特定运动干预 +- 建议方向: + - 有氧运动(快走、骑行、游泳,每周150分钟) + - 阻力与力量训练(每周2-3次) + - 身心结合练习(瑜伽、普拉提、太极) + - 运动强度与频率的个性化建议 + +### (3) Sleep & Stress Management 睡眠与压力管理 +- 领域概述:睡眠与压力对荷尔蒙节律、免疫平衡、代谢健康、认知表现的影响 +- 检测关联:哪些异常指标与慢性压力、睡眠不足相关 +- 建议方向: + - 睡眠卫生管理(7-8小时,黑暗安静环境,睡前减少电子产品) + - 放松训练(冥想、深呼吸、正念练习) + - 适应原支持(印度人参、红景天、茶氨酸) + - 昼夜节律优化 + +### (4) Lifestyle Adjustment 生活方式调整 +- 领域概述:饮食质量、运动习惯、压力应对、环境暴露对长期健康的综合影响 +- 检测关联:哪些异常指标提示需要生活方式层面的调整 +- 建议方向: + - 均衡饮食模式(全食物、植物多样性) + - 限制烟酒及过量咖啡因 + - 减少毒素暴露(充足饮水、减少环境污染、适度日晒) + - 社交与心理支持 + +### (5) Long-term Follow-up Plan 长期随访计划 +- 领域概述:功能医学强调健康是动态过程,需要持续评估与调整;长期随访帮助及早发现潜在不平衡,确保干预措施不断优化 +- 检测关联:结合本次检测中发现的异常(只提指标名称和偏高/偏低方向),说明需要长期监测的重点领域 +- 建议方向: + - 定期复查荷尔蒙、炎症标志物及免疫相关抗体(每6-12个月) + - 胃肠道评估(如幽门螺杆菌检测、肠道微生态检测) + - 定期功能医学随访,将营养、生活方式及压力管理纳入长期健康方案 + - 骨密度、代谢及心血管风险的定期监测 + +--- + +## 输出格式(JSON) + +```json +{ + "intro": { + "paragraphs": [ + { + "en": "Functional medicine goes beyond the diagnosis and medical treatment of diseases...(固定模板段落1)", + "cn": "功能医学不仅仅停留在疾病的诊断与医学治疗...(固定模板段落2)" + }, + { + "en": "Based on your test results and individual health status...(固定模板段落3)", + "cn": "基于您的检测结果与个人健康状况...(固定模板段落4)" + } + ] + }, + "sections": [ + { + "title_en": "Nutrition Intervention", + "title_cn": "营养干预", + "overview": { + "en": "英文领域概述(约100词):阐述营养在功能医学中的重要性...", + "cn": "中文领域概述(约120字):对应翻译..." + }, + "analysis": { + "en": "英文检测关联分析(约80词):结合异常指标分析营养干预的必要性(不写数值)...", + "cn": "中文检测关联分析(约100字):对应翻译..." + }, + "recommendations": [ + { + "en": "Supplementation of B vitamins, folate, and iron to support hematopoiesis and cellular energy production.", + "cn": "补充维生素B族、叶酸和铁,以支持造血功能和细胞能量产生;" + }, + { + "en": "Adequate protein and healthy fats (zinc, selenium, magnesium as cofactors) to support hormonal balance.", + "cn": "摄入足够的优质蛋白和健康脂肪(并配合锌、硒、镁等辅因子),以维持荷尔蒙平衡;" + }, + { + "en": "Probiotics, antioxidants, and anti-inflammatory nutrients (omega-3, vitamins C/E, polyphenols) to reduce inflammation and protect gut health.", + "cn": "用益生菌、抗氧化与抗炎营养素(如ω-3、维生素C/E、多酚类)降低炎症保护肠道健康。" + } + ], + "summary": { + "en": "英文总结意义(约80词):总结营养干预的整体价值和协同作用...", + "cn": "中文总结意义(约100字):对应翻译..." + } + }, + { + "title_en": "Exercise Intervention", + "title_cn": "运动干预", + "overview": {"en": "...", "cn": "..."}, + "analysis": {"en": "...", "cn": "..."}, + "recommendations": [ + {"en": "...", "cn": "..."}, + {"en": "...", "cn": "..."}, + {"en": "...", "cn": "..."} + ], + "summary": {"en": "...", "cn": "..."} + }, + { + "title_en": "Sleep & Stress Management", + "title_cn": "睡眠与压力管理", + "overview": {"en": "...", "cn": "..."}, + "analysis": {"en": "...", "cn": "..."}, + "recommendations": [...], + "summary": {"en": "...", "cn": "..."} + }, + { + "title_en": "Lifestyle Adjustment", + "title_cn": "生活方式调整", + "overview": {"en": "...", "cn": "..."}, + "analysis": {"en": "...", "cn": "..."}, + "recommendations": [...], + "summary": {"en": "...", "cn": "..."} + }, + { + "title_en": "Long-term Follow-up Plan", + "title_cn": "长期随访计划", + "overview": { + "en": "英文领域概述(约100词):阐述长期随访在功能医学中的重要性,强调健康是动态过程...", + "cn": "中文领域概述(约120字):对应翻译..." + }, + "analysis": { + "en": "英文检测关联分析(约80词):结合异常指标说明需要长期监测的重点领域(不写数值)...", + "cn": "中文检测关联分析(约100字):对应翻译..." + }, + "recommendations": [ + { + "en": "Laboratory reassessment every six to twelve months for hormonal panels, inflammatory markers, and immune-related antibodies.", + "cn": "每六至十二个月复查一次荷尔蒙、炎症标志物及免疫相关抗体;" + }, + { + "en": "Gastrointestinal evaluation (e.g., H. pylori testing, gut microbiome assessment) annually or as clinically indicated.", + "cn": "每年或在临床需要时进行胃肠道评估(如幽门螺杆菌检测、肠道微生态检测);" + }, + { + "en": "Regular functional medicine consultations to integrate nutritional, lifestyle, and stress management progress into ongoing health strategies.", + "cn": "定期进行功能医学随访,将营养、生活方式及压力管理的执行情况纳入长期健康方案中。" + } + ], + "summary": { + "en": "英文总结意义(约80词):总结长期随访的价值,强调从被动反应到主动预防的转变...", + "cn": "中文总结意义(约100字):对应翻译..." + } + } + ] +} +``` + +--- + +## 重要提示 +1. **每个段落必须先英文后中文,不要混排** +2. **全篇禁止写具体数值、单位或参考区间,可以提及指标名称和偏高/偏低方向** +3. **总述引导的4个段落使用固定模板文本** +4. **五个模块必须严格按顺序:营养干预 → 运动干预 → 睡眠与压力管理 → 生活方式调整 → 长期随访计划** +5. **每个模块必须包含完整的结构:领域概述 → 检测关联 → 引导语 → 具体建议(3-5条) → 总结意义** +6. **具体建议要可执行,包含频率、时长、具体方法等** +7. **每条建议都要与患者的异常指标相关联** +8. **语气始终保持专业、客观、温和** +9. **只返回JSON,不要其他内容** diff --git a/backend/prompts/health_assessment_prompt.md b/backend/prompts/health_assessment_prompt.md new file mode 100644 index 0000000..419a8ca --- /dev/null +++ b/backend/prompts/health_assessment_prompt.md @@ -0,0 +1,180 @@ +# 整体健康情况分析生成提示词 + +## 角色设定 +你是Be.U Med功能医学团队的资深医学顾问,在功能医学、整体健康、抗衰老医学领域具有丰富的临床经验。 + +## 任务 +根据体检者的血液检查报告,撰写"整体健康情况分析"报告。 + +--- + +## 核心原则(必须严格遵守) + +### 1. 段落格式(极其重要!) +- **每个段落必须先写英文,再写对应的中文** +- **英文段落:80-120词,不超过150词** +- **中文段落:80-120字,不超过150字** +- 不要英中混排,必须分开 + +### 0. 禁止输出数值(新增硬性要求) +- **全篇禁止出现任何数字、百分号、参考区间、单位、符号组合(0-9, %, mg/dL, mmol/L, ×10^9/L 等)** +- 只能用定性描述:正常/稳定/良好/偏高/偏低/接近上限/接近下限 +- 不要抄写原始报告中的结果值、参考区间、单位 + +### 2. 语言风格 +- 专业、客观、严谨,体现功能医学视角 +- 使用肯定、明确的表述,如"表明""显示""反映""需要""应当" +- **严禁使用任何不确定表述**,包括但不限于: + - 中文禁用词:可能、也许、或许、大概、似乎、有概率、有可能、可能会、或可、疑似、倾向于、趋向于、不排除、有待、存在...的可能 + - 英文禁用词:may, might, could, possibly, probably, perhaps, likely, potentially, tend to, appear to, seem to, it is possible that +- 禁用"必须""一定""保证""治愈"等绝对化表述 +- 不做临床疾病诊断,聚焦功能状态分析 + +### 3. 核心指标判定 +- **核心指标**:从医学角度判定各生理学系统的关键指标 +- **异常项**:超出参考范围的指标 + 逼近临界值的指标 +- 指标必须精准,标注具体数值及单位 + +--- + +## 文章结构(必须严格遵循) + +### 总述概述(2段) +**第一段**: +- 前半部分:列出重点正常项及其状态(不写数值/单位/区间,仅说明正常/稳定/良好) +- 后半部分:列出重点异常项或临界项(不写数值/单位/区间,仅说明偏高/偏低/接近上限或下限) +- 格式:先英文后中文 + +**第二段(固定模板+动态填充)**: +- 此段大部分内容为固定模板,只有标注 `{{动态}}` 的部分需要根据检测数据生成 +- **英文固定模板**: + "From a functional medicine and holistic health perspective, the current test results indicate that most of your core parameters fall within normal reference ranges, suggesting an overall stable health status. However, several `{{动态:异常指标所属系统,如 hematological, endocrine, and immunological}}` markers present with abnormalities or borderline variations. These findings indicate that attention is warranted in areas such as `{{动态:具体关注方向,如 inflammatory regulation, hormonal balance, cellular repair, and immune tolerance}}`, requiring further attention and long-term follow-up. A multidimensional interpretation is outlined below:" +- **中文固定模板**: + "从功能医学与整体健康角度来看,本次检测结果显示您的多数核心指标处于正常参考范围,整体健康状态较为稳定。然而,部分`{{动态:异常指标所属系统,如 血液学、内分泌及免疫学}}`指标出现了异常或边缘波动,这提示机体`{{动态:具体关注方向,如 在炎症调控、荷尔蒙平衡、细胞更新以及免疫耐受等方面}}`需要关注,值得进一步关注与长期随访。以下从不同维度进行综合性解读:" +- **动态部分生成规则**: + - 根据检测数据中实际出现异常的系统来填充 + - 系统对应关系:Hematology→血液学/hematological,Endocrine→内分泌/endocrine,Immunology→免疫学/immunological,Metabolism→代谢/metabolic + - 关注方向需与异常系统对应,如血液学异常→炎症调控,内分泌异常→荷尔蒙平衡,免疫学异常→免疫耐受,代谢异常→营养代谢 +- **严禁使用**:"可能""潜在风险""suggest potential risks"等不确定表述 + +### 四大系统分析(固定顺序,每个系统2段) + +**通用段落结构(适用于所有四个系统)**: + +**第一段(约120字/词)**: +- 正常项分析:说明该系统中哪些指标处于正常范围,体现的健康状态(不写数值/单位/区间) +- 接近临界项分析:说明哪些指标虽在参考范围内但接近临界值,需要关注(不写数值/单位/区间) +- 分析提示:一句话总结该系统整体状态或建议 +- **不写具体数值**,只描述指标名称和状态 +- 格式:先英文后中文 + +**第二段(约120字/词)**: +- 异常项分析:说明该系统中哪些指标超出正常范围,异常的方向(偏高/偏低,不写数值/单位/区间) +- 对其他系统的影响:说明这些异常指标对其他生理系统的影响和关联 +- **不写具体数值**,只描述指标名称和影响 +- 格式:先英文后中文 + +#### (I) Hematology and Inflammatory Status / (一)血液学与炎症状态 +按上述通用段落结构撰写 + +#### (II) Hormonal and Endocrine Regulation / (二)荷尔蒙与内分泌调节 +按上述通用段落结构撰写 + +#### (III) Immunology and Infection Risk / (三)免疫学与感染风险 +按上述通用段落结构撰写 + +#### (IV) Nutrition and Metabolic Profile / (四)营养与代谢状况 +按上述通用段落结构撰写 + +### 结尾总结(2段) +**第一段 - 功能医学健康管理重点**: +- 结合本次检测发现的异常项和接近临界值的指标(不写数值/单位/区间) +- 从功能医学干预、生活方式调控、定期随访监测三个角度进行总结概括 +- 明确指出需要重点关注和干预的方向 +- 格式:先英文后中文 + +**第二段 - 个性化管理方向**: +- 固定开头(英文):"Functional medicine emphasizes proactive prevention. Before clinical symptoms manifest, functional optimization through targeted medical intervention, nutritional adjustment, and lifestyle management can significantly enhance systemic balance. The abnormal and borderline findings highlighted in this report provide valuable guidance for initiating personalized functional medicine management." +- 固定开头(中文):"功能医学强调'未病先防',在疾病尚未出现严重临床症状前,通过针对性医学干预、营养调整及生活方式管理改善机体功能。本报告中的异常和临界指标正提示您应开展个性化功能医学管理," +- **动态生成部分**:根据本次检测的实际异常指标,生成具体的干预方向描述(如"以调节血脂代谢、优化免疫功能、改善炎症状态"等),这部分需要与前面分析的异常项对应 +- 固定结尾(英文):"This not only helps prevent chronic diseases but also plays an important role in delaying aging and improving quality of life." +- 固定结尾(中文):"这不仅有助于慢病预防,也对延缓衰老与提升生活质量具有重要意义。" +- 格式:先英文后中文 + +--- + +## 输出格式(JSON) + +```json +{ + "overview": { + "paragraph1": { + "en": "英文(80-120词):前半部分列重点正常项状态(不写数值/单位/区间),后半部分列重点异常/临界项状态(不写数值/单位/区间)...", + "cn": "中文(80-120字):对应翻译(不写数值/单位/区间)..." + }, + "paragraph2": { + "en": "英文(固定模板+动态填充)...", + "cn": "中文(固定模板+动态填充)..." + } + }, + "systems": [ + { + "key": "Hematology", + "title_en": "(I) Hematology and Inflammatory Status", + "title_cn": "(一)血液学与炎症状态", + "paragraph1": { + "en": "英文(约120词):正常项分析 + 接近临界项分析 + 分析提示,不含具体数值...", + "cn": "中文(约120字):对应翻译..." + }, + "paragraph2": { + "en": "英文(约120词):异常项分析 + 对其他系统的影响,不含具体数值...", + "cn": "中文(约120字):对应翻译..." + } + }, + { + "key": "Endocrine", + "title_en": "(II) Hormonal and Endocrine Regulation", + "title_cn": "(二)荷尔蒙与内分泌调节", + "paragraph1": {...}, + "paragraph2": {...} + }, + { + "key": "Immunology", + "title_en": "(III) Immunology and Infection Risk", + "title_cn": "(三)免疫学与感染风险", + "paragraph1": {...}, + "paragraph2": {...} + }, + { + "key": "Metabolism", + "title_en": "(IV) Nutrition and Metabolic Profile", + "title_cn": "(四)营养与代谢状况", + "paragraph1": {...}, + "paragraph2": {...} + } + ], + "conclusion": { + "management_focus": { + "en": "英文(80-120词):功能医学健康管理重点概括...", + "cn": "中文(80-120字):对应翻译..." + }, + "personalized_direction": { + "en": "英文(80-120词):个性化管理方向说明...", + "cn": "中文(80-120字):对应翻译..." + } + } +} +``` + +--- + +## 重要提示 +1. **每个段落必须先英文后中文,不要混排** +2. **全篇禁止写具体数值、单位或参考区间,只描述状态(正常/偏高/偏低/接近临界等)** + - 不要出现任何数字字符 0-9、百分号、幂次、科学计数或单位 +3. **总述第二段:固定模板+动态填充,只填充异常系统和关注方向** +4. **四大系统每段约120字/词,不写数值** +5. **第一段结构:正常项分析 → 接近临界项分析 → 分析提示(均不写数值)** +6. **第二段结构:异常项分析 → 对其他系统的影响(均不写数值)** +7. **结尾两段:功能医学管理重点 + 个性化管理方向(不写数值)** +8. **只返回JSON,不要其他内容** diff --git a/backend/prompts/medical_intervention_prompt.md b/backend/prompts/medical_intervention_prompt.md new file mode 100644 index 0000000..48de340 --- /dev/null +++ b/backend/prompts/medical_intervention_prompt.md @@ -0,0 +1,274 @@ +# 医学干预建议方案生成提示词 V2 + +## 角色设定 +你是Be.U Med功能医学团队的资深医学顾问,在功能医学、抗衰老医学、血液净化疗法、静脉营养疗法(IVNT)、生物同源性荷尔蒙疗法(BHRT)、干细胞再生医学领域具有丰富的临床经验。 + +## 任务 +根据体检者的异常指标,撰写个性化的「医学干预」建议方案。 +**硬性限制:** +- **不要写任何检验指标的具体数值、参考区间、单位、百分号或数字(0-9)** +- **疗法剂量/浓度不用写具体数字,可用“常规剂量/标准疗程/短程/中程/长程/每月一次”等文字描述** + +## 文章结构(必须严格遵循案例格式) + +### 总述部分(固定模板+动态填充) +- **标题**: `Medical Intervention 「医学干预」建议方案` + +**英文总述(固定模板)**: +"The current core medical intervention priorities are: `{{动态:干预方向,如 Vascular health (dyslipidemia + elevated lipoprotein(a)) → Thyroid autoimmune regulation → Iron metabolism balance → Hormonal homeostasis}}`." + +"Based on this, the principle guiding your proposed medical intervention plan is to first \"reduce vascular risk and oxidative stress\", followed by \"immune regulation and hormonal optimization\". Your issue does not lie with acute organ dysfunction, but rather with \"`{{动态:具体异常组合,如 elevated lipoprotein(a) + thyroid autoimmune activation + iron overload + mild androgen imbalance}}`\"—a combination of chronic risk factors that synergistically threaten vascular and endocrine health. In functional medicine, this pattern requires a multi-targeted, integrated intervention to address both symptoms and root causes." + +**中文总述(固定模板)**: +"当前核心的医疗干预方向为:`{{动态:干预方向,如 血管健康 → 荷尔蒙平衡 → 微量元素调节 → 机体抗衰}}`。基于此,针对您的「医学干预」建议方案的原则是先"降低血管风险与氧化应激",再"免疫调节与荷尔蒙优化"。`{{动态:问题定性,如 您的问题并非急性脏器功能障碍,而是"脂蛋白(a)升高 + 甲状腺自身免疫活化 + 铁过载 + 轻度雄激素失衡"的慢性风险组合,这些因素协同威胁血管与内分泌健康}}`。在功能医学上,此类格局需要多靶点、一体化干预,实现标本兼顾。" + +**动态部分生成规则**: +- 干预方向:根据异常指标所属的系统生成,格式为 `A → B → C → D` +- 具体异常组合:列出主要异常指标的英文描述 +- 问题定性:用中文描述异常指标的组合特征 + +### 板块结构(根据异常指标动态选择2-4个板块) + +每个板块包含: +1. **板块标题**: `X) English Title 中文标题` +2. **子板块**: 每个板块包含2个子板块 +3. **内容格式**: 每个内容点先英文段落,再中文段落 + +--- + +## 可选板块类型(根据异常指标选择) + +### 板块A: Vascular Protection & Metabolic Regulation 血管保护与代谢调控 +**适用于**: 血脂异常、脂蛋白(a)升高、胆固醇异常、动脉硬化风险、同型半胱氨酸升高 + +**A.1 Blood Purification Therapy 血液净化疗法** +内容要点: +- 疗法定位(针对哪些异常指标) +- DFPP深度血液净化(频次、疗程、作用机制) +- Ozone轻盈血液净化(频次、疗程、作用机制) +- 干预意义 + +**A.2 IVNT Nutritional Intervention 静脉营养点滴干预** +内容要点: +- IVNT定位说明 +- 肝胆排毒配方(核心成分、频次、作用) +- 免疫激活疗法配方(核心成分、频次、作用) +- 血管保护配方(核心成分、频次、作用) +- 干预意义 + +--- + +### 板块B: Immune Regulation & Thyroid Protection 免疫调节与甲状腺保护 +**适用于**: 甲状腺抗体升高(TgAb/TPOAb)、自身免疫活化、甲状腺功能异常(TSH/T3/T4) + +**B.1 Lymphocyte Therapy 淋巴细胞疗法** +内容要点: +- 疗法定位(启动时机、针对问题) +- 免疫调节机制(Treg细胞作用) +- 干预意义 + +**B.2 BHRT – Bioidentical Hormone Therapy 生物同源性荷尔蒙调理** +内容要点: +- 方案定位(靶向调节而非外源替代) +- 甲状腺轴优化(硒、维生素D3等) +- 性激素轴调节(根据性别和具体异常) +- 干预意义 + +--- + +### 板块C: Hormone-Centered Improvement Plan 荷尔蒙调理方案 +**适用于**: 性激素异常(FSH/LH/E2/PROG/TESTO)、围绝经期、DHEA-S异常、AMH降低 + +**C.1 Thyroid Axis Optimization 甲状腺轴优化** +内容要点: +- 甲状腺功能评估 +- 营养素支持方案 +- 监测计划 + +**C.2 Sex-steroid / Perimenopause Support 性激素与围绝经管理** +内容要点: +- 激素状态评估 +- BHRT个体化方案 +- 生活方式配合 +- 干预意义 + +--- + +### 板块D: Iron Metabolism Balance & Lifestyle Optimization 铁代谢平衡与生活方式优化 +**适用于**: 铁蛋白异常、铁过载、微量元素失衡、需要生活方式干预 + +**D.1 ONS (Oral Nutritional Support) 口服营养支持** +内容要点: +- 方案定位 +- 铁代谢调节(乳铁蛋白拮抗剂、维生素E等) +- 血管保护辅助(辅酶Q10、植物甾醇等) + +**D.2 Lifestyle Optimization 生活方式优化** +内容要点: +- 饮食建议(具体食物推荐和禁忌) +- 运动建议(类型、频次、时长) +- 作息与压力管理 +- 干预意义 + +--- + +### 板块E: Cellular Regeneration 细胞再生疗法(干细胞) +**适用于**: 多系统慢性问题、组织修复需求、抗衰老、作为其他干预的协同增效 + +**E.1 Clinical Positioning 临床定位** +内容要点: +- MSC疗法定位(再生修复+免疫调节) +- 启动时机(核心干预后3个月) +- 频次和疗程 +- MSC作用机制 + +**E.2 Synergy with Other Interventions 与其他干预的协同** +内容要点: +- 与血液净化的协同(血管保护) +- 与免疫调节的协同(甲状腺支持) +- 与BHRT的协同(荷尔蒙优化) + +--- + +### 签名部分(右对齐,空一行后显示) +``` +Functional Medical Team from Be.U Med +Be.U Med 功能医学团队 +YYYY年MM月DD日 +``` + +--- + +## 内容写作规范 + +### 段落格式(极其重要!严格遵守字数限制!) +- **每个子板块必须包含2-4个独立的内容段落,禁止合并成一整段** +- **每个内容段落必须先写英文,再写对应的中文** +- 英文段落:50-80词,绝对不能超过80词 +- 中文段落:70-120字,绝对不能超过120字(硬性上限140字) +- 不要英中混排,必须分开 +- **如果内容较多,必须拆分成多个段落,每个段落控制在限制内** + +### 子板块段落结构(必须遵循) +每个子板块应包含以下2-4个独立段落: +1. **段落1 - 疗法定位/概述**:说明该疗法针对哪些问题 +2. **段落2 - 具体方案**:详细的疗法/成分/机制说明 +3. **段落3 - 补充方案**(如有):其他相关疗法或配方 +4. **段落4 - 干预意义**:总结该干预的价值和协同作用(必须有) + +### 模块总结要求(重要!) +- **每个区域模块的大标题后面、子标题前面必须有一个总结性陈述段落** +- 总结段落约120-140字(中文),对应英文约80-100词 +- 总结内容:概括该模块要解决的核心问题、干预方向、预期价值 +- 格式:先英文总结,再中文总结 +- 位置:区域标题 → 模块总结 → 子标题1 → 子标题2... + +### 具体内容要求 +1. **疗法细节必须具体**: + - 频次(如:每3周1次、每月1次) + - 疗程(如:共3次、共6次) + - 时长(如:每次120分钟、每次30分钟) + - 核心成分(如:维生素C、谷胱甘肽、α-硫辛酸) + +2. **作用机制要清晰**: + - 说明疗法如何作用于具体异常指标 + - 解释生理机制 + +3. **干预意义要总结**: + - 每个子板块结尾要有"干预意义"段落 + - 说明该干预与其他干预的协同作用 + +### 语言风格 +- 专业但易懂,体现功能医学理念 +- 使用肯定、明确的表述,如"建议""支持""助力""优化""需要""应当" +- **严禁使用任何不确定表述**,包括但不限于: + - 中文禁用词:可能、也许、或许、大概、似乎、有概率、有可能、可能会、或可、疑似、倾向于、趋向于、不排除、有待、存在...的可能 + - 英文禁用词:may, might, could, possibly, probably, perhaps, likely, potentially, tend to, appear to, seem to, it is possible that +- 禁用"必须""一定""保证""治愈"等绝对化表述 +- 不夸大效果,实事求是 + +--- + +## 输出格式(JSON) + +```json +{ + "overview": { + "title_en": "Medical Intervention", + "title_cn": "「医学干预」建议方案", + "content_en": "英文总述(150-200词),包含核心干预方向、干预原则、问题定性...", + "content_cn": "中文总述(100-140字),对应英文内容..." + }, + "sections": [ + { + "number": "1", + "title_en": "Vascular Protection & Metabolic Regulation", + "title_cn": "血管保护与代谢调控", + "intro": { + "en": "模块总结英文(80-100词):概括该模块要解决的核心问题、干预方向、预期价值...", + "cn": "模块总结中文(120-140字):概括该模块要解决的核心问题、干预方向、预期价值..." + }, + "subsections": [ + { + "id": "1.1", + "title_en": "Blood Purification Therapy (DFPP + Ozone)", + "title_cn": "血液净化疗法(DFPP + Ozone)", + "paragraphs": [ + { + "en": "英文段落1:疗法定位和概述...", + "cn": "中文段落1:对应翻译..." + }, + { + "en": "英文段落2:DFPP疗法详情(频次、疗程、机制)...", + "cn": "中文段落2:对应翻译..." + }, + { + "en": "英文段落3:Ozone疗法详情...", + "cn": "中文段落3:对应翻译..." + }, + { + "en": "Intervention Significance: 干预意义英文...", + "cn": "干预意义: 中文..." + } + ] + }, + { + "id": "1.2", + "title_en": "IVNT Nutritional Intervention", + "title_cn": "静脉营养点滴干预", + "paragraphs": [...] + } + ] + }, + { + "number": "2", + "title_en": "Immune Regulation & Thyroid Protection", + "title_cn": "免疫调节与甲状腺保护", + "intro": { + "en": "模块总结英文...", + "cn": "模块总结中文..." + }, + "subsections": [...] + } + ], + "signature": { + "team_en": "Functional Medical Team from Be.U Med", + "team_cn": "Be.U Med 功能医学团队", + "date": "YYYY年MM月DD日(使用当前实际日期)" + } +} +``` + +--- + +## 重要提示 +1. **每个子板块必须包含2-4个独立段落,禁止合并成一整段** +2. **中文段落严格控制在70-120字,绝对不能超过140字** +3. **英文段落严格控制在50-80词,绝对不能超过80词** +4. **内容必须与体检者的具体异常指标关联,且不写检测数值/参考区间/单位/百分号/任何数字** +5. **根据异常指标类型选择合适的板块**(不是所有板块都需要) +6. **每个疗法说明频次/疗程/核心成分,避免写剂量、浓度、具体数值;可用“常规/标准疗程”“短程/中程/长程”或“每月一次”等文字描述** +7. **强调各干预之间的协同作用** +8. **每个内容点先英文后中文,不要混排** +9. **语气始终保持专业、客观、温和** +10. **只返回JSON,不要其他内容** diff --git a/backend/rebuild_config_from_template.py b/backend/rebuild_config_from_template.py new file mode 100644 index 0000000..77ff3ab --- /dev/null +++ b/backend/rebuild_config_from_template.py @@ -0,0 +1,107 @@ +""" +从模板文件重新生成 abb_mapping_config.json +只包含模板中实际存在的项目 +""" +from docx import Document +import json +import re + +def extract_items_from_template(): + """从模板中提取所有检测项目""" + doc = Document('template_complete.docx') + + # 模块映射:根据模板中的标题识别模块 + module_items = {} + current_module = None + + # 遍历所有表格 + for table in doc.tables: + for row in table.rows: + row_text = ' '.join([cell.text.strip() for cell in row.cells]) + + # 检测模块标题 + if 'Urine Detection' in row_text or '尿液检测' in row_text: + current_module = 'Urine Test' + elif 'Complete Blood Count' in row_text or '血常规' in row_text: + current_module = 'Complete Blood Count' + elif 'Blood Sugar' in row_text or '血糖' in row_text: + current_module = 'Blood Sugar' + elif 'Lipid Profile' in row_text or '血脂' in row_text: + current_module = 'Lipid Profile' + elif 'Blood Type' in row_text and '血型' in row_text: + current_module = 'Blood Type' + elif 'Blood Coagulation' in row_text or '凝血' in row_text: + current_module = 'Blood Coagulation' + elif 'Four Infectious' in row_text or '传染病' in row_text: + current_module = 'Four Infectious Diseases' + elif 'Serum Electrolytes' in row_text or '电解质' in row_text: + current_module = 'Serum Electrolytes' + elif 'Liver Function' in row_text or '肝功能' in row_text: + current_module = 'Liver Function' + elif 'Kidney Function' in row_text or '肾功能' in row_text: + current_module = 'Kidney Function' + elif 'Myocardial Enzyme' in row_text or '心肌酶' in row_text: + current_module = 'Myocardial Enzyme' + elif 'Thyroid Function' in row_text or '甲状腺' in row_text: + current_module = 'Thyroid Function' + elif 'Thromboembolism' in row_text or '血栓' in row_text: + current_module = 'Thromboembolism' + elif 'Bone Metabolism' in row_text or '骨代谢' in row_text: + current_module = 'Bone Metabolism' + elif 'Microelement' in row_text or '微量元素' in row_text: + current_module = 'Microelement' + elif 'Humoral Immunity' in row_text or '体液免疫' in row_text: + current_module = 'Humoral Immunity' + elif 'Inflammatory' in row_text or '炎症' in row_text: + current_module = 'Inflammatory Reaction' + elif 'Autoantibody' in row_text or '自身抗体' in row_text: + current_module = 'Autoantibody' + elif 'Female Hormone' in row_text or '女性荷尔蒙' in row_text: + current_module = 'Female Hormone' + elif 'Male Hormone' in row_text or '男性荷尔蒙' in row_text: + current_module = 'Male Hormone' + elif 'Tumor Markers' in row_text or '肿瘤标志物' in row_text: + current_module = 'Tumor Markers' + elif 'Lymphocyte' in row_text or '淋巴细胞亚群' in row_text: + current_module = 'Lymphocyte Subpopulation' + + # 提取ABB(第一列,短文本) + first_cell = row.cells[0].text.strip() if row.cells else '' + if first_cell and len(first_cell) < 30: + if 'Abb' not in first_cell and '简称' not in first_cell: + if not first_cell.startswith('Clinical') and not first_cell.startswith('临床'): + # 检查是否有临床意义(确认是数据行) + has_clinical = any('Clinical Significance' in cell.text for cell in row.cells) + if has_clinical and current_module: + if current_module not in module_items: + module_items[current_module] = [] + + # 获取项目名称(第二列) + project_name = row.cells[1].text.strip() if len(row.cells) > 1 else first_cell + + # 避免重复 + existing_abbs = [item['abb'] for item in module_items[current_module]] + if first_cell not in existing_abbs: + module_items[current_module].append({ + 'abb': first_cell, + 'project': project_name + }) + + return module_items + +def main(): + print('从模板提取检测项目...') + module_items = extract_items_from_template() + + total = sum(len(items) for items in module_items.values()) + print(f'共提取 {len(module_items)} 个模块, {total} 个项目') + + for module, items in module_items.items(): + print(f' {module}: {len(items)} 项') + for item in items[:3]: + print(f' - {item["abb"]}') + if len(items) > 3: + print(f' ... 还有 {len(items)-3} 项') + +if __name__ == '__main__': + main() diff --git a/backend/rebuild_template_explanations.py b/backend/rebuild_template_explanations.py new file mode 100644 index 0000000..e562866 --- /dev/null +++ b/backend/rebuild_template_explanations.py @@ -0,0 +1,95 @@ +""" +从模板文件重新提取所有检测项目的临床意义 +生成正确的 template_explanations.json +""" +from docx import Document +import json + +def main(): + doc = Document('template_complete.docx') + + explanations = {} + + # 遍历所有表格 + for table in doc.tables: + rows = list(table.rows) + + for i, row in enumerate(rows): + cells = row.cells + if not cells: + continue + + first_cell = cells[0].text.strip() + + # 跳过空行、表头、模块标题 + if not first_cell: + continue + if 'Abb' in first_cell or '简称' in first_cell: + continue + if 'Clinical' in first_cell: + continue + + # 检查是否是ABB行(短文本,不含占位符,不含中文模块名) + if len(first_cell) > 40 or '{{' in first_cell: + continue + + # 跳过模块标题(包含换行符和中文) + if '\n' in first_cell and any('\u4e00' <= c <= '\u9fff' for c in first_cell): + continue + + # 这是一个ABB,查找下一行的临床意义 + abb = first_cell + + # 在当前行或下一行查找临床意义 + clinical_text = None + + # 先检查当前行的其他单元格 + for cell in cells: + text = cell.text.strip() + if 'Clinical Significance:' in text and '临床意义:' in text: + clinical_text = text + break + + # 如果当前行没有,检查下一行 + if not clinical_text and i + 1 < len(rows): + next_row = rows[i + 1] + for cell in next_row.cells: + text = cell.text.strip() + if 'Clinical Significance:' in text and '临床意义:' in text: + clinical_text = text + break + + if clinical_text: + # 提取英文和中文 + parts = clinical_text.split('临床意义:') + if len(parts) == 2: + en = parts[0].replace('Clinical Significance:', '').strip() + cn = parts[1].strip() + + if en and cn: + # 标准化ABB名称 + abb_key = abb.upper().strip() + abb_key = abb_key.replace(' - ', '-').replace('(', '(').replace(')', ')') + + if abb_key not in explanations: + explanations[abb_key] = { + 'clinical_en': en, + 'clinical_cn': cn + } + + print(f'从模板提取了 {len(explanations)} 个项目的临床意义') + + # 保存 + with open('template_explanations.json', 'w', encoding='utf-8') as f: + json.dump(explanations, f, ensure_ascii=False, indent=2) + + print(f'已保存到 template_explanations.json') + + # 验证 Color + if 'COLOR' in explanations: + print(f'\nCOLOR 验证:') + print(f'EN: {explanations["COLOR"]["clinical_en"][:80]}...') + print(f'CN: {explanations["COLOR"]["clinical_cn"][:80]}...') + +if __name__ == '__main__': + main() diff --git a/backend/requirements.txt b/backend/requirements.txt new file mode 100644 index 0000000..0cdd860 --- /dev/null +++ b/backend/requirements.txt @@ -0,0 +1,29 @@ +fastapi==0.104.1 +uvicorn==0.24.0 +python-multipart==0.0.6 +pydantic==2.5.0 +requests==2.31.0 +python-dotenv==1.0.0 + +# OCR相关(可选) +baidu-aip==4.16.13 +paddleocr==2.7.0 +paddlepaddle==2.5.2 +pdfplumber==0.10.3 +Pillow>=10.0.0 + +# MinerU核心依赖(高精度文档解析) +loguru>=0.7.2 +numpy>=1.21.6 +tqdm>=4.67.1 + +# LLM相关(根据需要选择安装) +openai==1.3.0 +cozepy # Coze官方Python SDK + +# Word和PDF处理 +PyMuPDF # PDF处理 +python-docx # Word文档处理 +weasyprint==60.1 # 推荐,质量更好(需要GTK3) +xhtml2pdf==0.2.13 # 备用方案,更简单(纯Python) +jinja2==3.1.2 diff --git a/backend/run_report_generation.py b/backend/run_report_generation.py new file mode 100644 index 0000000..dfa8bfa --- /dev/null +++ b/backend/run_report_generation.py @@ -0,0 +1,162 @@ +""" +统一的医疗报告生成脚本运行器 + +使用方法: + # 使用 extract_and_fill_report.py(推荐 - 功能完整) + python run_report_generation.py --method extract + + # 使用 fill_with_docxtpl.py(简单模板填充) + python run_report_generation.py --method docxtpl + + # 强制重新提取(不使用缓存) + python run_report_generation.py --method extract --force + + # 使用DeepSeek分析 + python run_report_generation.py --method extract --deepseek + + # 指定DeepSeek API Key + python run_report_generation.py --method extract --deepseek --api-key YOUR_KEY +""" +import sys +import os +from pathlib import Path +from dotenv import load_dotenv + +# 加载环境变量 +load_dotenv() + +def print_config_info(method: str, force: bool, use_deepseek: bool): + """打印配置信息""" + print("=" * 70) + print(" 医疗报告生成系统") + print("=" * 70) + print(f" 运行方式: {method.upper()}") + print(f" 强制刷新: {'是' if force else '否(使用缓存)'}") + print(f" DeepSeek分析: {'启用 [OK]' if use_deepseek else '关闭'}") + + # 检查关键文件 + base_dir = Path(__file__).parent + template_complete = base_dir / "template_complete.docx" + template_docxtpl = Path(__file__).parent.parent / "template_docxtpl.docx" + config_file = base_dir / "abb_mapping_config.json" + + print(f"\n 关键文件检查:") + print(f" - 模板文件 (extract): {'[OK] 存在' if template_complete.exists() else '[X] 缺失'}") + print(f" - 模板文件 (docxtpl): {'[OK] 存在' if template_docxtpl.exists() else '[X] 缺失'}") + print(f" - 配置文件: {'[OK] 存在' if config_file.exists() else '[X] 缺失'}") + + if use_deepseek: + deepseek_key = os.environ.get('DEEPSEEK_API_KEY', '') + print(f" - DeepSeek API Key: {'[OK] 已配置' if deepseek_key else '[X] 未配置'}") + + print("=" * 70) + print() + + +def run_extract_method(force: bool, use_deepseek: bool, api_key: str = None): + """运行 extract_and_fill_report.py 方法""" + try: + from extract_and_fill_report import main as extract_main + extract_main(force_extract=force, use_deepseek=use_deepseek, deepseek_api_key=api_key) + except ImportError as e: + print(f"[ERROR] 导入失败: {e}") + print(" 请确保 extract_and_fill_report.py 文件存在") + sys.exit(1) + except Exception as e: + print(f"[ERROR] 运行失败: {e}") + import traceback + traceback.print_exc() + sys.exit(1) + + +def run_docxtpl_method(): + """运行 fill_with_docxtpl.py 方法""" + try: + from fill_with_docxtpl import main as docxtpl_main + docxtpl_main() + except ImportError as e: + print(f"[ERROR] 导入失败: {e}") + print(" 请确保 fill_with_docxtpl.py 文件存在") + sys.exit(1) + except Exception as e: + print(f"[ERROR] 运行失败: {e}") + import traceback + traceback.print_exc() + sys.exit(1) + + +def main(): + """主函数""" + # 解析命令行参数 + method = 'extract' # 默认方法 + force = False + use_deepseek = False + api_key = None + + # 解析参数 + args = sys.argv[1:] + i = 0 + while i < len(args): + arg = args[i] + + if arg in ['--method', '-m']: + if i + 1 < len(args): + method = args[i + 1] + i += 2 + else: + print("[ERROR] --method 参数需要指定值: extract 或 docxtpl") + sys.exit(1) + elif arg in ['--force', '-f']: + force = True + i += 1 + elif arg in ['--deepseek', '-d']: + use_deepseek = True + i += 1 + elif arg in ['--api-key', '-k']: + if i + 1 < len(args): + api_key = args[i + 1] + i += 2 + else: + print("[ERROR] --api-key 参数需要指定API Key") + sys.exit(1) + elif arg in ['--help', '-h']: + print(__doc__) + sys.exit(0) + else: + print(f"[ERROR] 未知参数: {arg}") + print(" 使用 --help 查看帮助信息") + sys.exit(1) + + # 如果没有指定API Key,尝试从环境变量获取 + if use_deepseek and not api_key: + api_key = os.environ.get('DEEPSEEK_API_KEY', '') + if not api_key: + print("[WARNING] 使用DeepSeek需要提供API Key") + print(" 方法1: 设置环境变量 DEEPSEEK_API_KEY") + print(" 方法2: 使用参数 --api-key YOUR_KEY") + sys.exit(1) + + # 验证方法 + if method not in ['extract', 'docxtpl']: + print(f"[ERROR] 未知的方法: {method}") + print(" 支持的方法: extract, docxtpl") + sys.exit(1) + + # 打印配置信息 + print_config_info(method, force, use_deepseek) + + # 运行对应的方法 + if method == 'extract': + run_extract_method(force, use_deepseek, api_key) + elif method == 'docxtpl': + if force or use_deepseek: + print("[WARNING] docxtpl 方法不支持 --force 和 --deepseek 参数,将忽略") + run_docxtpl_method() + + print("\n" + "=" * 70) + print("[SUCCESS] 脚本执行完成!") + print("=" * 70) + + +if __name__ == '__main__': + main() diff --git a/backend/services/__init__.py b/backend/services/__init__.py new file mode 100644 index 0000000..e69de29 diff --git a/backend/services/batch_report_service.py b/backend/services/batch_report_service.py new file mode 100644 index 0000000..ba94906 --- /dev/null +++ b/backend/services/batch_report_service.py @@ -0,0 +1,801 @@ +import os +import tempfile +from pathlib import Path +from typing import List, Dict, Any +from datetime import datetime + +# 导入 DeepSeek 健康内容生成服务 +from services.deepseek_health_service import DeepSeekHealthService + + +class BatchReportService: + """批量报告处理服务""" + + def __init__(self, ocr_service, llm_service, pdf_service, template_service): + self.ocr_service = ocr_service + self.llm_service = llm_service + self.pdf_service = pdf_service + self.template_service = template_service + + # 初始化 DeepSeek 健康内容生成服务 + self.deepseek_health_service = DeepSeekHealthService() + + # 临时文件目录 + self.temp_dir = Path(tempfile.gettempdir()) / "medical_reports_temp" + self.temp_dir.mkdir(exist_ok=True) + + def process_multiple_reports( + self, + file_paths: List[str], + patient_name: str = "患者" + ) -> Dict[str, Any]: + """ + 处理多个报告文件并生成综合健康报告 + + 新流程:直接将文件传给 Coze 工作流处理 + + Args: + file_paths: 临时上传的多个PDF文件路径列表 + patient_name: 患者姓名 + + Returns: + 包含分析结果和生成的PDF路径的字典 + """ + try: + print(f"正在处理 {len(file_paths)} 份报告...") + + # 准备文件信息列表 + file_infos = [] + for idx, file_path in enumerate(file_paths, 1): + filename = Path(file_path).name + print(f" [{idx}/{len(file_paths)}] 准备文件: {filename}") + file_infos.append({ + "filename": filename, + "filepath": file_path + }) + + # 调用分析(会根据 LLM 类型选择不同的处理方式) + print("正在进行综合分析...") + combined_analysis = self._analyze_with_files(file_infos) + + # 使用 DeepSeek 生成健康评估和建议内容 + if self.deepseek_health_service.is_available(): + health_content = self.deepseek_health_service.generate_health_content(combined_analysis) + if health_content: + combined_analysis["health_assessment"] = health_content.get("health_assessment", {}) + combined_analysis["health_advice"] = health_content.get("health_advice", {}) + # 更新异常项(DeepSeek 可能提供更详细的信息) + if health_content.get("abnormal_items"): + combined_analysis["abnormal_items_detailed"] = health_content["abnormal_items"] + else: + print("\n ⚠️ DeepSeek API Key 未配置,跳过健康评估和建议生成") + + # 生成综合报告 PDF + print("\n正在生成健康报告...") + pdf_path = self._generate_comprehensive_report( + patient_name=patient_name, + reports=file_infos, + analysis=combined_analysis + ) + + # 清理临时文件 + print("正在清理临时文件...") + self._cleanup_temp_files(file_paths) + + return { + "success": True, + "patient_name": patient_name, + "report_count": len(file_paths), + "analysis": combined_analysis, + "pdf_path": pdf_path, + "generated_at": datetime.now().isoformat() + } + + except Exception as e: + # 即使出错也要清理临时文件 + self._cleanup_temp_files(file_paths) + raise Exception(f"批量处理失败: {str(e)}") + + def _analyze_with_files(self, file_infos: List[Dict[str, str]]) -> Dict[str, Any]: + """ + 综合分析流程(两阶段处理): + 1. OCR 提取所有文件的文本 + 2. Coze 分析文本 → 返回 JSON + 3. Ollama 处理 Coze JSON → 生成 Be.U 风格报告 + """ + # 第1步:OCR 提取文本 + print(" [步骤1] OCR 提取文本...") + extracted_texts = [] + for idx, file_info in enumerate(file_infos, 1): + print(f" [{idx}/{len(file_infos)}] 识别: {file_info['filename']}") + text = self.ocr_service.extract_text(file_info["filepath"]) + extracted_texts.append({ + "filename": file_info["filename"], + "text": text + }) + + # 第2-3步:LLM 分析(Coze → Ollama 或 纯 Ollama) + print(" [步骤2-3] 综合分析与报告生成...") + return self._analyze_combined_reports(extracted_texts) + + def _analyze_with_coze_files(self, file_infos: List[Dict[str, str]]) -> Dict[str, Any]: + """ + 使用 Coze 文件上传 + 工作流处理 + 1. 上传文件到 Coze 获取 file_id + 2. 分批调用工作流(每批最多 3 个文件) + 3. 合并结果 + """ + import requests + import json + import time + + api_key = os.getenv("COZE_API_KEY") + workflow_id = os.getenv("COZE_WORKFLOW_ID") + + if not api_key or not workflow_id: + raise ValueError("未配置 Coze API 所需的 COZE_API_KEY 或 COZE_WORKFLOW_ID") + + # 第1步:上传所有文件获取 file_id + file_ids = [] + for idx, file_info in enumerate(file_infos, 1): + print(f" [{idx}/{len(file_infos)}] 上传: {file_info['filename']}") + + try: + file_id = self._upload_file_to_coze( + file_path=file_info['filepath'], + api_key=api_key + ) + file_ids.append({ + "filename": file_info['filename'], + "file_id": file_id + }) + print(f" ✓ File ID: {file_id}") + except Exception as e: + print(f" ✗ 上传失败: {e}") + raise Exception(f"文件上传失败: {file_info['filename']}, {e}") + + # 第2步:一次性调用工作流处理所有文件 + print(f"\n [步骤2] 调用 Coze 工作流分析 {len(file_ids)} 个文件...") + + # 构造请求参数:input 是字符串数组,每个元素是 JSON 字符串 + input_params = [] + for file_data in file_ids: + # 每个元素是 JSON 字符串格式:"{\"file_id\":\"xxx\"}" + json_str = json.dumps({"file_id": file_data["file_id"]}, ensure_ascii=False) + input_params.append(json_str) + print(f" - {file_data['filename']}: {file_data['file_id']}") + + # 调用工作流 + try: + final_result = self._call_coze_workflow( + workflow_id=workflow_id, + api_key=api_key, + input_params=input_params + ) + print(f" ✓ 工作流处理完成") + except Exception as e: + print(f" ✗ 工作流调用失败: {e}") + raise + + # 保存结果缓存 + try: + cache_file = Path("coze_result_cache.json") + cache_data = { + "timestamp": time.strftime('%Y-%m-%d %H:%M:%S'), + "report_count": len(file_ids), + "coze_result": final_result, + "file_ids": file_ids + } + cache_file.write_text(json.dumps(cache_data, ensure_ascii=False, indent=2), encoding='utf-8') + print(f" → Coze 结果已缓存到: {cache_file.absolute()}") + except Exception as e: + print(f" ⚠️ 缓存保存失败: {e}") + + return final_result + + def _upload_file_to_coze(self, file_path: str, api_key: str) -> str: + """ + 上传文件到 Coze 获取 file_id + """ + import requests + + upload_url = "https://api.coze.cn/v1/files/upload" + + headers = { + "Authorization": f"Bearer {api_key}" + } + + with open(file_path, 'rb') as f: + files = { + 'file': (Path(file_path).name, f, 'application/octet-stream') + } + + response = requests.post( + upload_url, + headers=headers, + files=files, + timeout=60 + ) + + if response.status_code != 200: + raise Exception(f"上传失败 (HTTP {response.status_code}): {response.text}") + + data = response.json() + + # 解析返回的 file_id + if data.get("code") == 0 and data.get("data"): + file_id = data["data"].get("id") or data["data"].get("file_id") + if file_id: + return file_id + + raise Exception(f"未能获取 file_id: {data}") + + def _call_coze_workflow(self, workflow_id: str, api_key: str, input_params: List[str]) -> Dict[str, Any]: + """ + 调用 Coze 工作流(使用流式接口) + input_params: 字符串数组,每个元素是 JSON 字符串格式的 file_id + """ + from cozepy import Coze, TokenAuth, COZE_CN_BASE_URL, WorkflowEventType + + # 初始化 Coze 客户端 + coze = Coze(auth=TokenAuth(token=api_key), base_url=COZE_CN_BASE_URL) + + print(f" → 调用工作流 (file_id 数量: {len(input_params)})...") + + # 调用流式工作流 + import time as time_module + start = time_module.time() + + stream = coze.workflows.stream_run( + workflow_id=workflow_id, + parameters={"input": input_params} + ) + + content_result = None + + for event in stream: + if event.event == WorkflowEventType.MESSAGE: + if hasattr(event, 'message'): + msg = event.message + node_title = getattr(msg, 'node_title', None) + node_is_finish = getattr(msg, 'node_is_finish', None) + content = getattr(msg, 'content', None) + + if node_title == "End" and node_is_finish and content: + content_result = content + break + + elif event.event == WorkflowEventType.ERROR: + error_msg = str(event.error) if hasattr(event, 'error') else "Unknown error" + raise Exception(f"工作流执行错误: {error_msg}") + + elapsed = time_module.time() - start + + if not content_result: + raise Exception("未获取到工作流执行结果") + + print(f" ✓ 工作流完成 (耗时: {elapsed:.1f}秒)") + + # 解析结果 + import json + try: + if isinstance(content_result, str): + result_data = json.loads(content_result) + else: + result_data = content_result + + # 提取 output + if isinstance(result_data, dict) and "output" in result_data: + output = result_data["output"] + if isinstance(output, str): + output = json.loads(output) + return output + + return result_data + except json.JSONDecodeError as e: + raise Exception(f"解析工作流结果失败: {e}") + + def _call_coze_with_files(self, file_infos: List[Dict[str, str]]) -> Dict[str, Any]: + """ + 直接将文件传给 Coze 工作流处理 + """ + try: + import requests + import json + + # 读取 Coze 配置 + api_url = os.getenv("COZE_API_URL", "https://api.coze.cn/v1/workflow/run") + api_key = os.getenv("COZE_API_KEY") + workflow_id = os.getenv("COZE_WORKFLOW_ID") + max_retries = int(os.getenv("COZE_MAX_RETRIES", "3")) + + if not api_key or not workflow_id: + raise ValueError("未配置 Coze API 所需的 COZE_API_KEY 或 COZE_WORKFLOW_ID") + + # 准备文件数据(根据 Coze API 要求构造) + # 如果 Coze 需要文件内容,读取并转换为 base64 + files_data = [] + for file_info in file_infos: + filepath = file_info["filepath"] + filename = file_info["filename"] + + # 读取文件内容并转换为 base64 + with open(filepath, 'rb') as f: + import base64 + file_content = base64.b64encode(f.read()).decode('utf-8') + + files_data.append({ + "filename": filename, + "content": file_content, + "type": "application/pdf" if filename.endswith('.pdf') else "image/jpeg" + }) + + headers = { + "Authorization": f"Bearer {api_key}", + "Content-Type": "application/json" + } + + # 构造请求体 + payload = { + "workflow_id": workflow_id, + "parameters": { + "input": files_data # 传递文件数组 + } + } + + print(f" 正在调用 Coze 工作流处理 {len(files_data)} 个文件...") + + last_error = None + for attempt in range(max_retries): + try: + response = requests.post(api_url, headers=headers, json=payload, timeout=300) + response.raise_for_status() + data = response.json() + + # 解析 Coze 返回结果 + if isinstance(data, dict) and data.get("code") == 0: + raw_data = data.get("data", {}) + if isinstance(raw_data, str): + try: + raw_data = json.loads(raw_data) + except json.JSONDecodeError: + pass + + output = raw_data.get("output", raw_data) + + # 使用 Ollama 增强 Coze 结果 + print(" [阶段2] 使用 Ollama 优化报告内容...") + final_result = self._enhance_with_ollama(output, f"处理了 {len(files_data)} 个文件") + + return final_result + + last_error = f"Coze API 返回非0 code: {data}" + + except Exception as e: + last_error = str(e) + if attempt < max_retries - 1: + wait_time = (attempt + 1) * 3 + print(f" 重试 {attempt + 1}/{max_retries}...") + import time + time.sleep(wait_time) + else: + break + + raise Exception(last_error or "Coze API 调用失败") + + except Exception as e: + print(f" ⚠ Coze 处理失败: {e}") + # 降级到 OCR + Ollama 方式 + print(" 降级到 OCR + Ollama 处理...") + extracted_texts = [] + for file_info in file_infos: + text = self.ocr_service.extract_text(file_info["filepath"]) + extracted_texts.append({ + "filename": file_info["filename"], + "text": text + }) + return self._analyze_combined_reports(extracted_texts) + + def _analyze_combined_reports(self, reports: List[Dict[str, str]]) -> Dict[str, Any]: + """ + 综合分析流程(两阶段处理): + - 如果使用 Coze: + 阶段1: Coze 工作流处理数据 → 返回结构化 JSON + 阶段2: Ollama 分析 JSON → 生成适配 Be.U 模板的专业报告 + - 如果使用其他LLM: + 直接使用该 LLM 生成报告 + """ + + if self.llm_service.llm_type == "coze": + # === 两阶段处理:Coze + Ollama === + + # 【阶段1】Coze 工作流分析 + print(" [阶段1/2] Coze 工作流分析中...") + print(f" - 处理 {len(reports)} 份报告") + + # Coze 工作流有执行超时限制,超过阈值时分批处理 + BATCH_SIZE = 3 # 每批最多 3 个报告 + + import json + + # 原始文本列表(给 Ollama 使用) + original_texts: List[str] = [] + # 所有批次的 Coze 结果 + all_coze_results: List[Dict[str, Any]] = [] + + # 分批处理 + total_batches = (len(reports) + BATCH_SIZE - 1) // BATCH_SIZE + + if total_batches > 1: + print(f" - 报告数量较多,将分 {total_batches} 批处理(每批 {BATCH_SIZE} 个)") + + for batch_idx in range(total_batches): + start_idx = batch_idx * BATCH_SIZE + end_idx = min(start_idx + BATCH_SIZE, len(reports)) + batch_reports = reports[start_idx:end_idx] + + if total_batches > 1: + print(f"\n [批次 {batch_idx + 1}/{total_batches}] 处理报告 {start_idx + 1}-{end_idx}") + + # 准备当前批次的数据 + coze_inputs: List[Dict[str, str]] = [] + + for report in batch_reports: + filename = report["filename"] + text = report["text"] + + # 保留一份可读的原始文本 + original_text = f"【文件名】{filename}\n【内容】\n{text}" + original_texts.append(original_text) + + # 构造传给 Coze 的 JSON 对象 + coze_obj = { + "filename": filename, + "text": text, + } + coze_inputs.append(coze_obj) + + print(f" - {filename}: {len(text)} 字符") + + print(f" - 本批次元素个数: {len(coze_inputs)}") + + # 保存当前批次的调试信息 + if total_batches > 1: + debug_file = Path(f"debug_batch_{batch_idx + 1}.json") + else: + debug_file = Path("debug_ocr_texts.json") + + try: + final_payload = { + "workflow_id": os.getenv("COZE_WORKFLOW_ID", ""), + "parameters": { + "input": coze_inputs + } + } + debug_file.write_text(json.dumps(final_payload, ensure_ascii=False, indent=2), encoding='utf-8') + print(f" → Payload 已保存: {debug_file.name}") + except Exception as e: + print(f" ⚠️ 保存调试文件失败: {e}") + + # 调用 Coze 处理当前批次 + print(f" → 调用 Coze 工作流...") + batch_result = self.llm_service.analyze_multiple_reports(coze_inputs) + + # 检查当前批次是否成功 + if batch_result.get("error"): + error_msg = batch_result.get('error') + print(f" ✗ 批次 {batch_idx + 1} 失败: {error_msg}") + raise Exception(f"Coze 工作流调用失败: {error_msg}") + + print(f" ✓ 批次 {batch_idx + 1} 完成") + all_coze_results.append(batch_result) + + # 合并所有批次的结果 + print(f"\n ✓ 所有批次处理完成,合并结果...") + coze_result = self._merge_batch_results(all_coze_results) + + # 保存 Coze 返回结果用于后续测试 + try: + import time + cache_file = Path("coze_result_cache.json") + cache_data = { + "timestamp": time.strftime('%Y-%m-%d %H:%M:%S'), + "report_count": len(reports), + "coze_result": coze_result, + "original_texts": original_texts + } + cache_file.write_text(json.dumps(cache_data, ensure_ascii=False, indent=2), encoding='utf-8') + print(f" → Coze 结果已缓存到: {cache_file.absolute()}") + except Exception as e: + print(f" ⚠️ 缓存保存失败: {e}") + + # 【阶段2】Ollama 优化生成 + print(" [阶段2/2] Ollama 生成 Be.U 风格报告...") + print(" - 将 Coze JSON 转换为专业报告内容") + print(" - 适配 Be.U Wellness Center 模板") + + # 合并原始文本供 Ollama 参考(仍然使用人类可读的文本,而不是 JSON 字符串) + combined_text = "\n\n".join(original_texts) + final_analysis = self._enhance_with_ollama(coze_result, combined_text) + + print(" ✓ 综合报告生成完成") + + return final_analysis + else: + # === 单阶段:直接使用当前 LLM === + print(f" 使用 {self.llm_service.llm_type} 直接生成报告...") + print(f" - 处理 {len(reports)} 份报告") + + # 合并所有报告文本 + combined_text = "\n\n=== 报告分隔 ===\n\n".join([ + f"【文件名】{report['filename']}\n【内容】\n{report['text']}" + for report in reports + ]) + + analysis = self.llm_service.analyze_single_report(combined_text) + + print(" ✓ 报告生成完成") + + return analysis + + def _merge_batch_results(self, batch_results: List[Dict[str, Any]]) -> Dict[str, Any]: + """ + 合并多个批次的 Coze 结果 + """ + if len(batch_results) == 1: + return batch_results[0] + + print(f" - 合并 {len(batch_results)} 个批次的结果...") + + # 合并结果 + merged = { + "summary": "", + "key_findings": [], + "abnormal_items": [], + "risk_assessment": "", + "recommendations": [] + } + + # 收集所有字段 + summaries = [] + risk_assessments = [] + + for idx, result in enumerate(batch_results, 1): + # 摘要 + if result.get("summary"): + summaries.append(f"批次{idx}: {result['summary']}") + + # 关键发现 + if result.get("key_findings"): + merged["key_findings"].extend(result["key_findings"]) + + # 异常指标 + if result.get("abnormal_items"): + merged["abnormal_items"].extend(result["abnormal_items"]) + + # 风险评估 + if result.get("risk_assessment"): + risk_assessments.append(f"批次{idx}: {result['risk_assessment']}") + + # 建议 + if result.get("recommendations"): + merged["recommendations"].extend(result["recommendations"]) + + # 合并摘要和风险评估 + merged["summary"] = "\n\n".join(summaries) if summaries else "未提供摘要" + merged["risk_assessment"] = "\n\n".join(risk_assessments) if risk_assessments else "未提供风险评估" + + print(f" - 合并后: 关键发现 {len(merged['key_findings'])} 项, " + f"异常指标 {len(merged['abnormal_items'])} 项, " + f"建议 {len(merged['recommendations'])} 项") + + return merged + + def _enhance_with_ollama(self, coze_result: Dict[str, Any], original_text: str) -> Dict[str, Any]: + """ + 使用 Ollama 分析 Coze 返回的 JSON,生成适配 Be.U 模板的最终报告内容 + """ + try: + import requests + import json + + # 构建给 Ollama 的提示词 + prompt = f"""你是一位专业的医疗报告撰写专家。现在需要基于 Coze 工作流返回的结构化数据,生成一份适合 Be.U Wellness Center 风格的功能医学健康报告。 + +Coze 工作流返回的数据: +{json.dumps(coze_result, ensure_ascii=False, indent=2)} + +原始检测报告文本: +{original_text} + +请基于以上信息,生成一份专业的综合健康报告,包含以下部分(JSON格式): + +1. summary: 综合健康摘要(整体评估,语言专业且易懂) +2. key_findings: 关键发现列表(提取最重要的检测结果) +3. abnormal_items: 异常指标详情(包含 name, value, reference, level) +4. risk_assessment: 健康风险评估(基于所有指标的综合分析) +5. recommendations: 个性化健康建议(具体可执行的建议) + +要求: +- 语言专业但易于理解 +- 突出重点和异常项 +- 提供可操作的健康建议 +- 使用 Be.U Wellness Center 的专业风格 +- 必须返回完整的 JSON 格式 + +请直接返回 JSON,不要有其他文字:""" + + # 调用 Ollama + ollama_host = os.getenv("OLLAMA_HOST", "http://localhost:11434") + ollama_model = os.getenv("OLLAMA_MODEL", "qwen2.5:7b") + + response = requests.post( + f"{ollama_host}/api/generate", + json={ + "model": ollama_model, + "prompt": prompt, + "stream": False + }, + timeout=300 + ) + + if response.status_code == 200: + content = response.json().get("response", "") + # 解析 Ollama 返回的 JSON + return self._parse_ollama_response(content) + else: + print(f" ⚠ Ollama 调用失败,使用 Coze 原始结果") + return coze_result + + except Exception as e: + print(f" ⚠ Ollama 增强失败: {e},使用 Coze 原始结果") + return coze_result + + def _parse_ollama_response(self, response: str) -> Dict[str, Any]: + """解析 Ollama 返回的 JSON""" + try: + import re + import json + + # 尝试提取 JSON + json_match = re.search(r'```(?:json)?\s*(\{.*?\})\s*```', response, re.DOTALL) + if json_match: + json_str = json_match.group(1) + else: + json_match = re.search(r'\{.*\}', response, re.DOTALL) + if json_match: + json_str = json_match.group(0) + else: + json_str = response + + result = json.loads(json_str) + + # 确保必需字段存在 + required_fields = ["summary", "key_findings", "abnormal_items", "risk_assessment", "recommendations"] + for field in required_fields: + if field not in result: + result[field] = [] if field in ["key_findings", "abnormal_items", "recommendations"] else "未提供" + + return result + + except Exception as e: + print(f" ⚠ JSON 解析失败: {e}") + return { + "summary": "解析失败", + "key_findings": [], + "abnormal_items": [], + "risk_assessment": "无法生成", + "recommendations": [] + } + + def _direct_ollama_analysis(self, combined_text: str) -> Dict[str, Any]: + """ + Coze 失败后的降级方案:直接使用 Ollama 生成完整报告 + """ + try: + import requests + import json + + print(" → 使用 Ollama 模型生成完整报告...") + + # 构建 Ollama 提示词 + prompt = f"""你是一位专业的医疗报告分析助手。请分析以下医疗报告,提供专业的综合健康评估。 + +医疗报告内容: +{combined_text} + +请按以下 JSON 格式返回分析结果: +{{ + "summary": "综合健康摘要(2-3句话)", + "key_findings": ["关键发现1", "关键发现2", "..."], + "abnormal_items": [ + {{ + "name": "指标名称", + "result": "测量值", + "reference": "参考范围", + "level": "high/low/normal" + }} + ], + "risk_assessment": "健康风险评估(综合说明)", + "recommendations": ["建议1", "建议2", "..."] +}} + +请严格按照 JSON 格式返回,不要添加其他说明文字。""" + + # 调用 Ollama + response = requests.post( + "http://localhost:11434/api/generate", + json={ + "model": "qwen2.5:7b", + "prompt": prompt, + "stream": False + }, + timeout=180 + ) + + if response.status_code == 200: + ollama_response = response.json().get("response", "") + print(f" ✓ Ollama 响应完成") + + # 解析 JSON + import re + json_match = re.search(r'\{.*\}', ollama_response, re.DOTALL) + if json_match: + result = json.loads(json_match.group(0)) + + # 确保必需字段存在 + required_fields = ["summary", "key_findings", "abnormal_items", "risk_assessment", "recommendations"] + for field in required_fields: + if field not in result: + result[field] = [] if field in ["key_findings", "abnormal_items", "recommendations"] else "未提供" + + return result + else: + raise ValueError("无法解析 Ollama 返回的 JSON") + else: + raise Exception(f"Ollama API 返回错误: {response.status_code}") + + except Exception as e: + print(f" ⚠️ Ollama 降级方案也失败: {e}") + return { + "summary": "由于系统问题,暂时无法生成完整分析", + "key_findings": ["OCR 文本提取完成", "分析服务暂时不可用"], + "abnormal_items": [], + "risk_assessment": "建议稍后重试或使用其他方式分析", + "recommendations": ["联系技术支持", "检查系统配置"] + } + + def _generate_comprehensive_report( + self, + patient_name: str, + reports: List[Dict[str, str]], + analysis: Dict[str, Any] + ) -> str: + """生成综合健康报告 PDF""" + + # 准备扩展的模板数据 + template_data = { + "patient_name": patient_name, + "report_count": len(reports), + "report_list": [r["filename"] for r in reports], + "analysis": analysis, + "generation_date": datetime.now().strftime("%Y年%m月%d日") + } + + # 生成 PDF(使用增强的模板) + pdf_path = self.pdf_service.generate_comprehensive_report( + patient_name=patient_name, + template_data=template_data + ) + + return pdf_path + + def _cleanup_temp_files(self, file_paths: List[str]): + """清理临时文件""" + for file_path in file_paths: + try: + if os.path.exists(file_path): + os.remove(file_path) + print(f" ✓ 已删除临时文件: {Path(file_path).name}") + except Exception as e: + print(f" ⚠ 删除临时文件失败 {Path(file_path).name}: {e}") diff --git a/backend/services/data_store.py b/backend/services/data_store.py new file mode 100644 index 0000000..76ec4b4 --- /dev/null +++ b/backend/services/data_store.py @@ -0,0 +1,135 @@ +import json +import os +from pathlib import Path +from typing import Dict, Any, Optional +from datetime import datetime +import threading + +class DataStore: + """数据持久化存储服务""" + + def __init__(self, storage_dir: str = "data"): + self.storage_dir = Path(storage_dir) + self.storage_dir.mkdir(exist_ok=True) + + # 数据文件路径 + self.data_file = self.storage_dir / "reports_data.json" + + # 内存缓存 + self._cache: Dict[str, Any] = {} + + # 线程锁,防止并发写入冲突 + self._lock = threading.Lock() + + # 启动时加载数据 + self._load_data() + + def _load_data(self): + """从文件加载数据""" + if self.data_file.exists(): + try: + with open(self.data_file, 'r', encoding='utf-8') as f: + self._cache = json.load(f) + print(f"✓ 成功加载 {len(self._cache)} 份报告数据") + except Exception as e: + print(f"⚠ 加载数据失败: {e},将使用空数据") + self._cache = {} + else: + print("✓ 数据文件不存在,将创建新文件") + self._cache = {} + + def _save_data(self): + """保存数据到文件""" + try: + with self._lock: + # 创建临时文件,避免写入过程中断导致数据损坏 + temp_file = self.data_file.with_suffix('.json.tmp') + with open(temp_file, 'w', encoding='utf-8') as f: + json.dump(self._cache, f, ensure_ascii=False, indent=2) + + # 原子性替换 + temp_file.replace(self.data_file) + except Exception as e: + print(f"⚠ 保存数据失败: {e}") + + def get_all(self) -> Dict[str, Any]: + """获取所有报告数据""" + return self._cache.copy() + + def get(self, file_id: str) -> Optional[Dict[str, Any]]: + """获取单个报告数据""" + return self._cache.get(file_id) + + def set(self, file_id: str, data: Dict[str, Any]) -> None: + """设置/更新报告数据""" + self._cache[file_id] = data + self._save_data() + + def update(self, file_id: str, updates: Dict[str, Any]) -> None: + """更新报告数据的部分字段""" + if file_id in self._cache: + self._cache[file_id].update(updates) + self._save_data() + else: + raise KeyError(f"报告 {file_id} 不存在") + + def delete(self, file_id: str) -> None: + """删除报告数据""" + if file_id in self._cache: + del self._cache[file_id] + self._save_data() + + def exists(self, file_id: str) -> bool: + """检查报告是否存在""" + return file_id in self._cache + + def count(self) -> int: + """获取报告总数""" + return len(self._cache) + + def cleanup_orphaned_files(self, upload_dir: Path) -> int: + """清理孤立的文件(数据库中有记录但文件不存在)""" + cleaned = 0 + orphaned_ids = [] + + for file_id, report in self._cache.items(): + filepath = report.get('filepath') + if filepath and not os.path.exists(filepath): + orphaned_ids.append(file_id) + + for file_id in orphaned_ids: + self.delete(file_id) + cleaned += 1 + + if cleaned > 0: + print(f"✓ 清理了 {cleaned} 条孤立记录") + + return cleaned + + def export_backup(self, backup_path: Optional[str] = None) -> str: + """导出备份""" + if backup_path is None: + timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") + backup_path = self.storage_dir / f"backup_{timestamp}.json" + else: + backup_path = Path(backup_path) + + with open(backup_path, 'w', encoding='utf-8') as f: + json.dump(self._cache, f, ensure_ascii=False, indent=2) + + print(f"✓ 数据已备份到: {backup_path}") + return str(backup_path) + + def import_backup(self, backup_path: str) -> None: + """从备份恢复数据""" + backup_path = Path(backup_path) + if not backup_path.exists(): + raise FileNotFoundError(f"备份文件不存在: {backup_path}") + + with open(backup_path, 'r', encoding='utf-8') as f: + imported_data = json.load(f) + + self._cache.update(imported_data) + self._save_data() + + print(f"✓ 已从备份恢复 {len(imported_data)} 份报告") diff --git a/backend/services/deepseek_health_service.py b/backend/services/deepseek_health_service.py new file mode 100644 index 0000000..1c759bc --- /dev/null +++ b/backend/services/deepseek_health_service.py @@ -0,0 +1,480 @@ +""" +DeepSeek 健康评估与建议生成服务 +用于生成"整体健康状况"和"功能性健康建议"内容 + +优化:优先使用模板中已有的项目解释,只有模板中没有的项目才调用 DeepSeek 生成 +""" + +import os +import json +import requests +from pathlib import Path +from typing import List, Dict, Any + + +class DeepSeekHealthService: + """DeepSeek 健康内容生成服务""" + + def __init__(self, api_key: str = None): + self.api_key = api_key or os.getenv("DEEPSEEK_API_KEY", "") + self.api_url = "https://api.deepseek.com/v1/chat/completions" + + # 加载模板中的解释 + self.template_explanations = self._load_template_explanations() + + def _load_template_explanations(self) -> Dict[str, Dict[str, str]]: + """加载模板中已有的项目解释""" + explanations_file = Path(__file__).parent.parent / "template_explanations.json" + + if explanations_file.exists(): + try: + with open(explanations_file, 'r', encoding='utf-8') as f: + explanations = json.load(f) + print(f" ✓ 已加载 {len(explanations)} 个模板解释") + return explanations + except Exception as e: + print(f" ⚠️ 加载模板解释失败: {e}") + + return {} + + def get_template_explanation(self, abb: str) -> Dict[str, str]: + """ + 获取模板中的项目解释 + + Args: + abb: 项目缩写 + + Returns: + {"clinical_en": "...", "clinical_cn": "..."} 或空字典 + """ + # 尝试多种匹配方式 + abb_upper = abb.upper().strip() + + # 直接匹配 + if abb_upper in self.template_explanations: + return self.template_explanations[abb_upper] + + # 去除特殊字符后匹配 + abb_clean = ''.join(c for c in abb_upper if c.isalnum()) + for key, value in self.template_explanations.items(): + key_clean = ''.join(c for c in key if c.isalnum()) + if abb_clean == key_clean: + return value + + return {} + + def is_available(self) -> bool: + """检查服务是否可用""" + return bool(self.api_key) + + def call_deepseek(self, prompt: str) -> str: + """调用 DeepSeek API""" + if not self.api_key: + raise ValueError("未配置 DEEPSEEK_API_KEY") + + headers = { + "Authorization": f"Bearer {self.api_key}", + "Content-Type": "application/json" + } + + data = { + "model": "deepseek-chat", + "messages": [{"role": "user", "content": prompt}], + "temperature": 0.1, + "max_tokens": 8000 + } + + response = requests.post( + self.api_url, + headers=headers, + json=data, + timeout=120 + ) + response.raise_for_status() + return response.json()["choices"][0]["message"]["content"] + + def collect_abnormal_items(self, analysis: Dict[str, Any]) -> List[Dict[str, str]]: + """ + 从分析结果中收集异常项 + + Args: + analysis: LLM 分析结果,包含 abnormal_items 字段 + + Returns: + 异常项列表 + """ + abnormal_items = [] + + # 从 abnormal_items 字段提取 + raw_items = analysis.get("abnormal_items", []) + + for item in raw_items: + if isinstance(item, dict): + abnormal_items.append({ + "name": item.get("name", ""), + "abb": item.get("abb", item.get("name", "")), + "result": str(item.get("result", item.get("value", ""))), + "reference": item.get("reference", ""), + "unit": item.get("unit", ""), + "level": item.get("level", ""), + "point": "↑" if item.get("level") == "high" else ("↓" if item.get("level") == "low" else "") + }) + elif isinstance(item, str): + # 如果是字符串格式,尝试解析 + abnormal_items.append({ + "name": item, + "abb": "", + "result": "", + "reference": "", + "unit": "", + "level": "", + "point": "" + }) + + return abnormal_items + + def get_item_explanations(self, abnormal_items: List[Dict[str, str]]) -> Dict[str, Dict[str, str]]: + """ + 为异常项获取解释(优先使用模板中的解释,缺失的才调用 DeepSeek 生成) + + Args: + abnormal_items: 异常项列表 + + Returns: + { + "ABB": {"clinical_en": "...", "clinical_cn": "..."}, + ... + } + """ + explanations = {} + items_need_generation = [] + + print("\n 📋 检查模板中的项目解释...") + + for item in abnormal_items: + abb = item.get("abb", "").upper().strip() + name = item.get("name", "") + + if not abb: + continue + + # 尝试从模板获取解释 + template_exp = self.get_template_explanation(abb) + + if template_exp and template_exp.get("clinical_en") and template_exp.get("clinical_cn"): + explanations[abb] = template_exp + print(f" ✓ {abb}: 使用模板解释") + else: + items_need_generation.append(item) + print(f" ○ {abb}: 需要生成解释") + + # 如果有需要生成的项目,调用 DeepSeek + if items_need_generation and self.api_key: + print(f"\n 🤖 调用 DeepSeek 为 {len(items_need_generation)} 个项目生成解释...") + generated = self._generate_missing_explanations(items_need_generation) + explanations.update(generated) + + return explanations + + def _generate_missing_explanations(self, items: List[Dict[str, str]]) -> Dict[str, Dict[str, str]]: + """ + 调用 DeepSeek 为缺失解释的项目生成临床意义 + + Args: + items: 需要生成解释的项目列表 + + Returns: + 生成的解释字典 + """ + if not items: + return {} + + # 构建项目描述 + items_desc = [] + for item in items: + desc = f"- {item['abb']}: {item['name']}" + if item.get('result'): + desc += f", 结果: {item['result']}" + if item.get('unit'): + desc += f" {item['unit']}" + if item.get('reference'): + desc += f", 参考范围: {item['reference']}" + items_desc.append(desc) + + prompt = f"""你是一位医学检验专家,请为以下医疗检测项目生成临床意义解释。 + +## 需要解释的项目: +{chr(10).join(items_desc)} + +## 要求: +1. 为每个项目提供英文和中文的临床意义解释 +2. 解释应包含:该指标的作用、正常范围的意义、异常时可能的原因 +3. 语言专业但易于理解 +4. 每个解释约50-100字 + +## 输出格式(JSON): +```json +{{ + "ABB1": {{ + "clinical_en": "English clinical significance...", + "clinical_cn": "中文临床意义..." + }}, + "ABB2": {{ + "clinical_en": "...", + "clinical_cn": "..." + }} +}} +``` + +只返回JSON,不要其他说明。""" + + try: + response = self.call_deepseek(prompt) + + # 解析 JSON + if "```json" in response: + response = response.split("```json")[1].split("```")[0] + elif "```" in response: + response = response.split("```")[1].split("```")[0] + + result = json.loads(response.strip()) + print(f" ✓ 成功生成 {len(result)} 个项目的解释") + return result + except Exception as e: + print(f" ✗ 生成解释失败: {e}") + return {} + + def generate_health_assessment(self, abnormal_items: List[Dict[str, str]]) -> Dict[str, Any]: + """ + 生成"整体健康状况"评估内容 + + Args: + abnormal_items: 异常项列表 + + Returns: + 包含多个小节的健康评估内容 + """ + if not self.api_key or not abnormal_items: + return {"sections": []} + + # 构建异常项描述 + abnormal_desc = [] + for item in abnormal_items: + direction = "偏高" if item.get("point") in ["↑", "H", "高"] or item.get("level") == "high" else "偏低" + desc = f"- {item['name']}" + if item.get('abb'): + desc += f" ({item['abb']})" + desc += f": {item['result']}" + if item.get('unit'): + desc += f" {item['unit']}" + desc += f" ({direction}" + if item.get('reference'): + desc += f", 参考范围: {item['reference']}" + desc += ")" + abnormal_desc.append(desc) + + prompt = f"""你是一位功能医学专家,请根据以下所有异常检测指标,撰写"整体健康状况评估"的内容。 + +## 异常指标: +{chr(10).join(abnormal_desc)} + +## 要求: +1. 根据异常指标的类型,自动分成合适的小节(如血液学、内分泌、免疫、代谢等,根据实际异常项决定) +2. 每个小节包含英文和中文两个版本 +3. 从功能医学和整体健康角度分析 +4. 解释可能的原因和健康影响 +5. 语言专业但易于理解 +6. 每个小节的每个语言版本约150-250字 + +## 输出格式(JSON): +```json +{{ + "sections": [ + {{ + "title_en": "(I) Section Title in English", + "title_cn": "(一)中文小节标题", + "content_en": "English analysis content...", + "content_cn": "中文分析内容..." + }} + ] +}} +``` + +只返回JSON,不要其他说明。根据实际异常项情况决定分几个小节,不要硬套固定模板。""" + + try: + response = self.call_deepseek(prompt) + + # 解析 JSON + if "```json" in response: + response = response.split("```json")[1].split("```")[0] + elif "```" in response: + response = response.split("```")[1].split("```")[0] + + result = json.loads(response.strip()) + print(f" ✓ 生成健康评估内容,共 {len(result.get('sections', []))} 个小节") + return result + except Exception as e: + print(f" ✗ 生成健康评估内容失败: {e}") + return {"sections": []} + + def generate_health_advice(self, abnormal_items: List[Dict[str, str]]) -> Dict[str, Any]: + """ + 生成"功能性健康建议"内容 + + Args: + abnormal_items: 异常项列表 + + Returns: + 包含5个固定小节的健康建议内容 + """ + if not self.api_key or not abnormal_items: + return {"sections": []} + + # 异常项描述 + abnormal_desc = [] + for item in abnormal_items: + direction = "偏高" if item.get("point") in ["↑", "H", "高"] or item.get("level") == "high" else "偏低" + desc = f"- {item['name']}" + if item.get('abb'): + desc += f" ({item['abb']})" + desc += f": {item['result']}" + if item.get('unit'): + desc += f" {item['unit']}" + desc += f" ({direction})" + abnormal_desc.append(desc) + + prompt = f"""你是一位功能医学专家,请根据以下异常检测指标,撰写"功能医学健康建议"的内容。 + +## 异常指标: +{chr(10).join(abnormal_desc)} + +## 要求: +1. 必须包含以下5个固定小节(按顺序): + - Nutrition Intervention 营养干预 + - Exercise Intervention 运动干预 + - Sleep & Stress Management 睡眠与压力管理 + - Lifestyle Adjustment 生活方式调整 + - Long-term Follow-up Plan 长期随访计划 +2. 每个小节针对这些异常指标提供具体、可执行的建议 +3. 从功能医学角度出发,强调预防和整体调理 +4. 每个小节包含3-5条具体建议措施 +5. 语言专业但易于理解 +6. 分别提供英文和中文版本 +7. 每个小节的每个语言版本约200-300字 + +## 输出格式(JSON): +```json +{{ + "sections": [ + {{ + "title_en": "Nutrition Intervention", + "title_cn": "营养干预", + "content_en": "English nutrition advice...", + "content_cn": "中文营养建议..." + }}, + {{ + "title_en": "Exercise Intervention", + "title_cn": "运动干预", + "content_en": "...", + "content_cn": "..." + }}, + {{ + "title_en": "Sleep & Stress Management", + "title_cn": "睡眠与压力管理", + "content_en": "...", + "content_cn": "..." + }}, + {{ + "title_en": "Lifestyle Adjustment", + "title_cn": "生活方式调整", + "content_en": "...", + "content_cn": "..." + }}, + {{ + "title_en": "Long-term Follow-up Plan", + "title_cn": "长期随访计划", + "content_en": "...", + "content_cn": "..." + }} + ] +}} +``` + +只返回JSON,不要其他说明。""" + + try: + response = self.call_deepseek(prompt) + + # 解析 JSON + if "```json" in response: + response = response.split("```json")[1].split("```")[0] + elif "```" in response: + response = response.split("```")[1].split("```")[0] + + result = json.loads(response.strip()) + print(f" ✓ 生成健康建议内容,共 {len(result.get('sections', []))} 个小节") + return result + except Exception as e: + print(f" ✗ 生成健康建议内容失败: {e}") + return {"sections": []} + + def generate_health_content(self, analysis: Dict[str, Any]) -> Dict[str, Any]: + """ + 生成完整的健康评估和建议内容 + + 优化:优先使用模板中已有的项目解释,只有模板中没有的项目才调用 DeepSeek 生成 + + Args: + analysis: LLM 分析结果 + + Returns: + 包含 health_assessment, health_advice, item_explanations 的字典 + """ + if not self.is_available(): + print(" ⚠️ DeepSeek API Key 未配置,跳过健康内容生成") + return {} + + print("\n============================================================") + print("DeepSeek 健康内容生成") + print("============================================================") + + # 收集异常项 + print("\n 📝 正在收集异常项...") + abnormal_items = self.collect_abnormal_items(analysis) + + if not abnormal_items: + print(" ℹ️ 没有检测到异常项目,跳过内容生成") + return {} + + print(f" 发现 {len(abnormal_items)} 个异常项目:") + for item in abnormal_items[:10]: + direction = "↑" if item.get("level") == "high" or item.get("point") == "↑" else "↓" + print(f" - {item['name']}: {item['result']} {direction}") + if len(abnormal_items) > 10: + print(f" ... 等共 {len(abnormal_items)} 项") + + # 获取项目解释(优先使用模板,缺失的才生成) + item_explanations = self.get_item_explanations(abnormal_items) + + # 统计使用情况 + template_count = sum(1 for abb in item_explanations if self.get_template_explanation(abb)) + generated_count = len(item_explanations) - template_count + print(f"\n 📊 解释来源统计: 模板 {template_count} 个, DeepSeek生成 {generated_count} 个") + + # 生成健康评估 + print("\n 🤖 正在调用 DeepSeek 生成整体健康状况...") + health_assessment = self.generate_health_assessment(abnormal_items) + + # 生成健康建议 + print("\n 🤖 正在调用 DeepSeek 生成功能性健康建议...") + health_advice = self.generate_health_advice(abnormal_items) + + print("\n ✓ 健康内容生成完成") + + return { + "health_assessment": health_assessment, + "health_advice": health_advice, + "abnormal_items": abnormal_items, + "item_explanations": item_explanations # 包含每个项目的解释 + } diff --git a/backend/services/llm_service.py b/backend/services/llm_service.py new file mode 100644 index 0000000..6cd2069 --- /dev/null +++ b/backend/services/llm_service.py @@ -0,0 +1,506 @@ +import os +import json +from typing import Dict, Any, List + +class LLMService: + """大语言模型服务,支持本地Ollama或OpenAI API""" + + def __init__(self): + self.llm_type = self._detect_llm_type() + self._initialize_llm() + + def _detect_llm_type(self) -> str: + """检测可用的LLM类型""" + # 优先使用 DeepSeek(如果已配置) + if os.getenv("DEEPSEEK_API_KEY") and os.getenv("USE_DEEPSEEK_LLM", "true").lower() == "true": + return "deepseek" + # 检查 OpenAI + elif os.getenv("OPENAI_API_KEY"): + return "openai" + # Coze(如果已配置) + elif os.getenv("COZE_API_KEY") and os.getenv("COZE_WORKFLOW_ID"): + return "coze" + elif os.getenv("OLLAMA_HOST") or self._check_ollama_available(): + return "ollama" + else: + return "mock" + + def _check_ollama_available(self) -> bool: + """检查Ollama是否可用""" + try: + import requests + response = requests.get("http://localhost:11434/api/tags", timeout=2) + return response.status_code == 200 + except: + return False + + def _initialize_llm(self): + """初始化LLM客户端""" + if self.llm_type == "deepseek": + try: + self.deepseek_api_key = os.getenv("DEEPSEEK_API_KEY") + self.deepseek_api_url = os.getenv("DEEPSEEK_API_BASE", "https://api.deepseek.com") + "/v1/chat/completions" + self.model = os.getenv("DEEPSEEK_MODEL", "deepseek-chat") + print(f"✓ 使用 DeepSeek API (模型: {self.model})") + except Exception as e: + print(f"⚠ DeepSeek 初始化失败: {e}") + self.llm_type = "mock" + + elif self.llm_type == "openai": + try: + from openai import OpenAI + + # 支持自定义API端点 + api_key = os.getenv("OPENAI_API_KEY") + api_base = os.getenv("OPENAI_API_BASE") + + if api_base: + self.client = OpenAI(api_key=api_key, base_url=api_base) + else: + self.client = OpenAI(api_key=api_key) + + self.model = os.getenv("OPENAI_MODEL", "gpt-3.5-turbo") + print(f"✓ 使用 OpenAI API (模型: {self.model})") + except Exception as e: + print(f"⚠ OpenAI 初始化失败: {e}") + self.llm_type = "mock" + + elif self.llm_type == "ollama": + try: + import requests + self.ollama_host = os.getenv("OLLAMA_HOST", "http://localhost:11434") + # 默认使用已安装的 qwen2.5:7b 模型,如需更换可通过 OLLAMA_MODEL 环境变量覆盖 + self.model = os.getenv("OLLAMA_MODEL", "qwen2.5:7b") + print(f"✓ 使用 Ollama (模型: {self.model})") + except Exception as e: + print(f"⚠ Ollama 初始化失败: {e}") + self.llm_type = "mock" + + elif self.llm_type == "coze": + try: + # Coze 工作流调用所需配置,通过环境变量提供 + self.coze_api_url = os.getenv("COZE_API_URL", "https://api.coze.cn/v1/workflow/run") + self.coze_api_key = os.getenv("COZE_API_KEY") + self.coze_workflow_id = os.getenv("COZE_WORKFLOW_ID") + + if not self.coze_api_key or not self.coze_workflow_id: + raise ValueError("COZE_API_KEY 或 COZE_WORKFLOW_ID 未配置") + + print("✓ 使用 Coze 工作流作为LLM") + except Exception as e: + print(f"⚠ Coze 初始化失败: {e}") + self.llm_type = "mock" + + if self.llm_type == "mock": + print("✓ 使用模拟LLM模式(用于演示)") + + def analyze_single_report(self, report_text: str) -> Dict[str, Any]: + """分析单个报告""" + prompt = f"""请分析以下医疗报告,提取关键信息: + +{report_text} + +请提供: +1. 摘要 +2. 关键发现 +3. 异常指标 +4. 风险评估 +5. 建议 + +以JSON格式返回结果。 +""" + + if self.llm_type == "deepseek": + return self._call_deepseek(prompt) + elif self.llm_type == "openai": + return self._call_openai(prompt) + elif self.llm_type == "ollama": + return self._call_ollama(prompt) + elif self.llm_type == "coze": + # 对于 Coze,直接将原始报告文本传给工作流,由工作流内部负责解析与生成结构化结果 + print(f" → 准备调用 Coze 工作流...") + coze_input = [{ + "filename": "single_report", + "text": report_text, + }] + result = self._call_coze(coze_input) # 单个报告也作为数组传入 + print(f" ← Coze 调用返回") + return result + else: + return self._mock_analysis(report_text) + + def analyze_multiple_reports(self, report_texts: List[str]) -> Dict[str, Any]: + """ + 分析多个报告(Coze专用) + report_texts: 报告文本的数组,每个元素是一个PDF的文本 + """ + if self.llm_type == "coze": + print(f" → 准备调用 Coze 工作流(传入 {len(report_texts)} 个报告)...") + result = self._call_coze(report_texts) + print(f" ← Coze 调用返回") + return result + else: + # 其他LLM类型合并文本后调用 + combined = "\n\n".join(report_texts) + return self.analyze_single_report(combined) + + def _call_openai(self, prompt: str) -> Dict[str, Any]: + """调用OpenAI API""" + try: + response = self.client.chat.completions.create( + model=self.model, + messages=[ + {"role": "system", "content": "你是一位专业的医疗报告分析助手。"}, + {"role": "user", "content": prompt} + ], + temperature=0.7, + max_tokens=2000 + ) + + content = response.choices[0].message.content + return self._parse_llm_response(content) + + except Exception as e: + return { + "error": f"OpenAI API 调用失败: {str(e)}", + "summary": "分析失败", + "key_findings": [], + "abnormal_items": [], + "risk_assessment": "无法评估", + "recommendations": [] + } + + def _call_deepseek(self, prompt: str) -> Dict[str, Any]: + """调用 DeepSeek API""" + try: + import requests + + headers = { + "Authorization": f"Bearer {self.deepseek_api_key}", + "Content-Type": "application/json" + } + + data = { + "model": self.model, + "messages": [ + {"role": "system", "content": "你是一位专业的医疗报告分析助手。请以JSON格式返回分析结果。"}, + {"role": "user", "content": prompt} + ], + "temperature": 0.3, + "max_tokens": 4000 + } + + response = requests.post( + self.deepseek_api_url, + headers=headers, + json=data, + timeout=120 + ) + + if response.status_code == 200: + content = response.json()["choices"][0]["message"]["content"] + return self._parse_llm_response(content) + else: + raise Exception(f"DeepSeek 返回错误: {response.status_code} - {response.text}") + + except Exception as e: + return { + "error": f"DeepSeek API 调用失败: {str(e)}", + "summary": "分析失败", + "key_findings": [], + "abnormal_items": [], + "risk_assessment": "无法评估", + "recommendations": [] + } + + def _call_coze(self, report_texts: List[str]) -> Dict[str, Any]: + """ + 调用 Coze 工作流 API(流式模式),对医疗报告进行分析 + report_texts: 报告文本数组,每个元素是一个PDF的文本 + """ + try: + import time + from cozepy import Coze, TokenAuth, COZE_CN_BASE_URL, WorkflowEventType + + api_key = getattr(self, "coze_api_key", os.getenv("COZE_API_KEY")) + workflow_id = getattr(self, "coze_workflow_id", os.getenv("COZE_WORKFLOW_ID")) + max_retries = int(os.getenv("COZE_MAX_RETRIES", "3")) + + if not api_key or not workflow_id: + raise ValueError("未配置 Coze API 所需的 COZE_API_KEY 或 COZE_WORKFLOW_ID") + + print(f" → 调用 Coze 工作流(流式模式)...") + print(f" → Workflow ID: {workflow_id}") + print(f" → 数组元素个数: {len(report_texts)}") + total_chars = 0 + for item in report_texts: + if isinstance(item, str): + total_chars += len(item) + elif isinstance(item, dict): + text_value = item.get("text") + if isinstance(text_value, str): + total_chars += len(text_value) + print(f" → 总文本长度: {total_chars} 字符") + print(f" → 请求发送时间: {time.strftime('%H:%M:%S')}") + + # 初始化 Coze 客户端 + coze = Coze(auth=TokenAuth(token=api_key), base_url=COZE_CN_BASE_URL) + + # 添加请求开始时间 + import time as time_module + start = time_module.time() + + last_error = None + for attempt in range(max_retries): + try: + if attempt > 0: + print(f" → 重试 {attempt}/{max_retries - 1}...") + + # 调用流式接口 + stream = coze.workflows.runs.stream( + workflow_id=workflow_id, + parameters={"input": report_texts} + ) + + print(f" ✓ 已连接到流式接口,等待执行...") + + # 处理事件流 + content_result = None + event_count = 0 + for event in stream: + event_count += 1 + print(f" [事件 {event_count}] 类型: {event.event}") + + if event.event == WorkflowEventType.MESSAGE: + # 打印进度信息 + if hasattr(event, 'message') and event.message: + msg = event.message + node_type = getattr(msg, 'node_type', None) + node_title = getattr(msg, 'node_title', None) + node_is_finish = getattr(msg, 'node_is_finish', None) + content = getattr(msg, 'content', None) + + print(f" 节点标题: {node_title}") + print(f" 节点类型: {node_type}") + print(f" 是否完成: {node_is_finish}") + print(f" 内容长度: {len(content) if content else 0}") + + if node_title: + print(f" ⏳ 执行节点: {node_title} (类型: {node_type})") + + # 检查是否为结束节点(使用 node_title 判断) + if node_title == "End" and node_is_finish and content: + print(f" ✓ 工作流执行完成,获取到结果") + content_result = content + break + + elif event.event == WorkflowEventType.ERROR: + error_msg = str(event.error) if hasattr(event, 'error') else "Unknown error" + print(f" ✗ 错误事件: {error_msg}") + raise Exception(f"工作流执行错误: {error_msg}") + + elif event.event == WorkflowEventType.INTERRUPT: + print(f" ⚠️ 工作流需要交互,暂不支持") + raise Exception("工作流需要人工交互,当前不支持") + + if not content_result: + raise Exception("未获取到工作流执行结果") + + elapsed = time_module.time() - start + print(f" → 收到完整结果 (耗时: {elapsed:.1f}秒)") + print(f" → 结果数据: {content_result[:200]}...") + + # 解析 content 字段(通常包含 JSON 格式的输出) + # content 格式示例: {"output":"```json\n{...}\n```"} + if isinstance(content_result, str): + # 尝试解析为 JSON + try: + content_json = json.loads(content_result) + output = content_json.get("output", content_result) + except json.JSONDecodeError: + output = content_result + + # 如果 output 包含 markdown 格式的 JSON,提取出来 + if isinstance(output, str) and "```json" in output: + import re + json_match = re.search(r'```json\s*\n(.*?)\n```', output, re.DOTALL) + if json_match: + output = json_match.group(1) + + # 尝试解析最终的 JSON + data = {"code": 0, "data": {"output": output}} + else: + data = {"code": 0, "data": {"output": content_result}} + + # 参考间隔定时脚本的返回结构:{ code: 0, data: { output: ... } } + if isinstance(data, dict) and data.get("code") == 0: + raw_data = data.get("data", {}) + if isinstance(raw_data, str): + try: + raw_data = json.loads(raw_data) + except json.JSONDecodeError: + # data.data 为字符串,直接按 LLM 文本解析 + return self._parse_llm_response(raw_data) + + # 期望 workflow 在 data.output 中返回结果 + output = raw_data.get("output", raw_data) + + # 如果 output 还是字符串,再次解析 + if isinstance(output, str): + try: + output = json.loads(output) + print(f" ✓ Coze 返回的 output 需要二次解析") + except json.JSONDecodeError: + print(f" ⚠️ output 为字符串但无法解析为JSON,尝试文本解析") + return self._parse_llm_response(output) + + if isinstance(output, dict): + # 如果已经是结构化结果,直接补齐字段 + print(f" ✓ Coze 返回结构化数据") + print(f" → 包含字段: {list(output.keys())}") + + result = output + required_fields = [ + "summary", + "key_findings", + "abnormal_items", + "risk_assessment", + "recommendations", + ] + for field in required_fields: + if field not in result: + result[field] = [] if field in [ + "key_findings", + "abnormal_items", + "recommendations", + ] else "未提供" + + print(f" ✓✓ Coze 工作流调用成功!") + return result + + if isinstance(output, str): + # output 为文本,通过原有 JSON 解析逻辑处理 + return self._parse_llm_response(output) + + # 其它类型(列表等),转为字符串后再解析 + return self._parse_llm_response(json.dumps(output, ensure_ascii=False)) + + # code 非 0,视为错误 + last_error = f"Coze API 返回非0 code: {data}" + print(f" ✗ Coze 返回错误: {last_error}") + + except Exception as e: # 包含超时在内的所有请求异常 + last_error = str(e) + print(f" ✗ Coze API 调用失败: {last_error}") + if attempt < max_retries - 1: + # 简单的递增退避等待 + wait_time = (attempt + 1) * 3 + print(f" → 等待 {wait_time} 秒后重试...") + time.sleep(wait_time) + else: + print(f" ✗✗ 已达最大重试次数,放弃调用") + break + + print(f" ✗✗ Coze 工作流调用最终失败: {last_error}") + raise Exception(last_error or "Coze API 调用失败") + + except Exception as e: + return { + "error": f"Coze API 调用失败: {str(e)}", + "summary": "分析失败", + "key_findings": [], + "abnormal_items": [], + "risk_assessment": "无法评估", + "recommendations": [] + } + + def _call_ollama(self, prompt: str) -> Dict[str, Any]: + """调用Ollama API""" + try: + import requests + + response = requests.post( + f"{self.ollama_host}/api/generate", + json={ + "model": self.model, + "prompt": prompt, + "stream": False + }, + timeout=300 + ) + + if response.status_code == 200: + content = response.json().get("response", "") + return self._parse_llm_response(content) + else: + raise Exception(f"Ollama 返回错误: {response.status_code}") + + except Exception as e: + return { + "error": f"Ollama API 调用失败: {str(e)}", + "summary": "分析失败", + "key_findings": [], + "abnormal_items": [], + "risk_assessment": "无法评估", + "recommendations": [] + } + + def _parse_llm_response(self, response: str) -> Dict[str, Any]: + """解析LLM响应""" + try: + # 尝试提取JSON内容 + import re + + # 查找JSON代码块 + json_match = re.search(r'```(?:json)?\s*(\{.*?\})\s*```', response, re.DOTALL) + if json_match: + json_str = json_match.group(1) + else: + # 查找裸JSON + json_match = re.search(r'\{.*\}', response, re.DOTALL) + if json_match: + json_str = json_match.group(0) + else: + json_str = response + + result = json.loads(json_str) + + # 验证必需字段 + required_fields = ["summary", "key_findings", "abnormal_items", "risk_assessment", "recommendations"] + for field in required_fields: + if field not in result: + result[field] = [] if field in ["key_findings", "abnormal_items", "recommendations"] else "未提供" + + return result + + except: + # 解析失败,返回原始文本 + return { + "summary": "无法解析LLM响应", + "raw_response": response, + "key_findings": [], + "abnormal_items": [], + "risk_assessment": "解析失败", + "recommendations": [] + } + + def _mock_analysis(self, report_text: str) -> Dict[str, Any]: + """模拟分析结果""" + return { + "summary": "这是一份血常规检查报告。根据报告内容,各项指标均在正常参考范围内,未发现明显异常。", + "key_findings": [ + "白细胞计数: 6.5×10^9/L,正常范围", + "红细胞计数: 4.8×10^12/L,正常范围", + "血红蛋白: 145 g/L,正常范围", + "血小板计数: 220×10^9/L,正常范围" + ], + "abnormal_items": [], + "risk_assessment": "低风险。所有检测指标均在正常范围内,未发现需要关注的异常项。建议定期体检,保持健康生活方式。", + "recommendations": [ + "继续保持良好的生活习惯", + "定期进行健康体检(建议每年一次)", + "保持均衡饮食和适量运动", + "如有不适症状,及时就医" + ], + "note": "这是一个模拟的分析结果。实际使用时请配置 OpenAI API 或本地 Ollama 模型。" + } diff --git a/backend/services/ocr_service.py b/backend/services/ocr_service.py new file mode 100644 index 0000000..82f3ede --- /dev/null +++ b/backend/services/ocr_service.py @@ -0,0 +1,222 @@ +import os +import sys +from pathlib import Path +from typing import Union +import tempfile +import shutil + +class OCRService: + """OCR识别服务 - 支持 MinerU、百度云OCR API、PaddleOCR(生产模式,不支持演示)""" + + def __init__(self): + self.ocr_type = self._detect_ocr_type() + self._initialize_ocr() + + def _detect_ocr_type(self) -> str: + """检测可用的OCR类型""" + # 最优先使用百度云OCR API(速度快、精度高、免费额度足够日常使用) + if os.getenv("BAIDU_OCR_APP_ID") and os.getenv("BAIDU_OCR_API_KEY") and os.getenv("BAIDU_OCR_SECRET_KEY"): + return "baidu_cloud" + # 其次使用 MinerU(最强大的文档解析工具,但速度慢) + elif self._check_mineru(): + return "mineru" + # 再次使用PaddleOCR + elif self._check_paddleocr(): + return "paddleocr" + # 没有可用的OCR + else: + raise RuntimeError( + "❌ 没有可用的OCR引擎!请至少配置以下一种:\n" + "1. MinerU - 将 MinerU-master 文件夹放在桌面\n" + "2. 百度OCR - 配置环境变量 BAIDU_OCR_*\n" + "3. PaddleOCR - 运行 pip install paddleocr paddlepaddle" + ) + + def _initialize_ocr(self): + """初始化OCR引擎""" + if self.ocr_type == "mineru": + try: + # 添加 MinerU 路径 + mineru_path = Path(r"c:\Users\UI\Desktop\MinerU-master") + if mineru_path.exists() and str(mineru_path) not in sys.path: + sys.path.insert(0, str(mineru_path)) + + from demo.demo import parse_doc + self.mineru_parse = parse_doc + + try: + from torch.serialization import add_safe_globals + from doclayout_yolo.nn.tasks import YOLOv10DetectionModel + add_safe_globals([YOLOv10DetectionModel]) + except Exception: + pass + + print("✓ 使用 MinerU 引擎(高精度文档解析)") + except Exception as e: + raise RuntimeError(f"❌ MinerU 初始化失败: {e}\n请安装完整依赖或使用其他OCR引擎") + + elif self.ocr_type == "baidu_cloud": + try: + from aip import AipOcr + app_id = os.getenv("BAIDU_OCR_APP_ID") + api_key = os.getenv("BAIDU_OCR_API_KEY") + secret_key = os.getenv("BAIDU_OCR_SECRET_KEY") + self.baidu_client = AipOcr(app_id, api_key, secret_key) + print("✓ 使用百度云OCR API(高精度)") + except Exception as e: + raise RuntimeError(f"❌ 百度云OCR初始化失败: {e}\n请检查环境变量配置") + + elif self.ocr_type == "paddleocr": + try: + from paddleocr import PaddleOCR + self.paddle_ocr = PaddleOCR(use_angle_cls=True, lang="ch", show_log=False) + print("✓ 使用 PaddleOCR 引擎(本地离线)") + except Exception as e: + raise RuntimeError(f"❌ PaddleOCR 初始化失败: {e}\n请运行: pip install paddleocr paddlepaddle") + + def _check_mineru(self) -> bool: + """检查MinerU是否可用""" + try: + mineru_path = Path(r"c:\Users\UI\Desktop\MinerU-master") + return mineru_path.exists() and (mineru_path / "demo" / "demo.py").exists() + except: + return False + + def _check_paddleocr(self) -> bool: + """检查PaddleOCR是否可用""" + try: + import paddleocr + return True + except ImportError: + return False + + def extract_text(self, file_path: Union[str, Path]) -> str: + """从图片或PDF中提取文本""" + file_path = str(file_path) + file_ext = Path(file_path).suffix.lower() + + if file_ext == '.pdf': + return self._extract_from_pdf(file_path) + else: + return self._extract_from_image(file_path) + + def _extract_from_image(self, image_path: str) -> str: + """从图片中提取文本""" + if self.ocr_type == "mineru": + return self._extract_with_mineru(image_path) + elif self.ocr_type == "baidu_cloud": + return self._extract_with_baidu_cloud(image_path) + elif self.ocr_type == "paddleocr": + return self._extract_with_paddleocr(image_path) + else: + raise RuntimeError("OCR引擎未正确初始化") + + def _extract_with_mineru(self, file_path: str) -> str: + """使用 MinerU 提取文本(支持PDF和图片)""" + try: + # 创建临时输出目录 + temp_dir = tempfile.mkdtemp(prefix="mineru_") + + try: + # 调用 MinerU 解析 + file_path_obj = Path(file_path) + self.mineru_parse( + path_list=[file_path_obj], + output_dir=temp_dir, + lang="ch", # 中文 + backend="pipeline", # 使用 pipeline 模式 + method="auto" # 自动检测 + ) + + # 读取生成的 markdown 文件 + md_files = list(Path(temp_dir).rglob("*.md")) + if md_files: + # 优先排除 layout / span / origin 等辅助文件 + content_files = [ + f for f in md_files + if not any(x in f.stem for x in ['layout', 'span', 'origin']) + ] + target_files = content_files or md_files + with open(target_files[0], 'r', encoding='utf-8') as f: + content = f.read() + return content if content.strip() else "未识别到文本内容" + + return "未识别到文本内容" + + finally: + # 清理临时目录 + try: + shutil.rmtree(temp_dir) + except: + pass + + except Exception as e: + return f"MinerU识别出错: {str(e)}" + + def _extract_with_baidu_cloud(self, image_path: str) -> str: + """使用百度云OCR API提取文本""" + try: + # 读取图片 + with open(image_path, 'rb') as f: + image_data = f.read() + + # 调用通用文字识别(高精度版) + result = self.baidu_client.accurateBasic(image_data) + + if 'error_code' in result: + return f"百度OCR错误 ({result['error_code']}): {result.get('error_msg', '未知错误')}" + + # 提取文本 + if 'words_result' in result: + text_lines = [item['words'] for item in result['words_result']] + return "\n".join(text_lines) if text_lines else "未识别到文本内容" + + return "未识别到文本内容" + + except Exception as e: + return f"百度云OCR识别出错: {str(e)}" + + def _extract_with_paddleocr(self, image_path: str) -> str: + """使用PaddleOCR提取文本""" + try: + result = self.paddle_ocr.ocr(image_path, cls=True) + + if not result or not result[0]: + return "未识别到文本内容" + + # 提取所有文本行 + text_lines = [] + for line in result[0]: + if line and len(line) >= 2: + text_lines.append(line[1][0]) + + return "\n".join(text_lines) if text_lines else "未识别到文本内容" + + except Exception as e: + return f"PaddleOCR识别出错: {str(e)}" + + def _extract_from_pdf(self, pdf_path: str) -> str: + """从PDF中提取文本""" + # 优先使用 MinerU 处理 PDF(效果最好) + if self.ocr_type == "mineru": + return self._extract_with_mineru(pdf_path) + + # 备选方案:使用 pdfplumber + try: + import pdfplumber + + text_content = [] + with pdfplumber.open(pdf_path) as pdf: + for page in pdf.pages: + text = page.extract_text() + if text: + text_content.append(text) + + return "\n\n".join(text_content) if text_content else "未提取到文本内容" + + except ImportError: + # PDF库不可用,尝试使用OCR处理PDF的图像 + return "PDF处理需要安装 pdfplumber 库\n可以运行: pip install pdfplumber" + except Exception as e: + return f"PDF处理出错: {str(e)}" + diff --git a/backend/services/pdf_service.py b/backend/services/pdf_service.py new file mode 100644 index 0000000..7982871 --- /dev/null +++ b/backend/services/pdf_service.py @@ -0,0 +1,267 @@ +import os +from pathlib import Path +from datetime import datetime +from typing import Dict, Any +from jinja2 import Environment, FileSystemLoader + +class PDFService: + """PDF报告生成服务""" + + def __init__(self): + # 模板目录 + self.template_dir = Path(__file__).parent.parent / "templates" + self.template_dir.mkdir(exist_ok=True) + + # 输出目录 + self.output_dir = Path(__file__).parent.parent / "generated_reports" + self.output_dir.mkdir(exist_ok=True) + + # 初始化Jinja2环境 + self.jinja_env = Environment(loader=FileSystemLoader(str(self.template_dir))) + + def generate_report( + self, + filename: str, + analysis: Dict[str, Any], + llm_type: str = "Coze Workflow" + ) -> str: + """ + 生成PDF报告 + + Args: + filename: 原始文件名 + analysis: 分析结果 + llm_type: 使用的LLM类型 + + Returns: + 生成的PDF文件路径 + """ + try: + # 准备模板数据 + template_data = self._prepare_template_data(filename, analysis, llm_type) + + # 渲染HTML + html_content = self._render_html(template_data) + + # 生成PDF + pdf_path = self._generate_pdf(html_content, filename) + + return pdf_path + + except Exception as e: + raise Exception(f"PDF生成失败: {str(e)}") + + def _prepare_template_data( + self, + filename: str, + analysis: Dict[str, Any], + llm_type: str + ) -> Dict[str, Any]: + """准备模板数据""" + # 处理 key_findings + key_findings = analysis.get("key_findings", []) + if key_findings: + # 如果是对象数组,提取文本 + key_findings = [ + item.get("finding", item.get("text", str(item))) + if isinstance(item, dict) else str(item) + for item in key_findings + ] + + # 处理 abnormal_items + abnormal_items = analysis.get("abnormal_items", []) + if abnormal_items: + processed_items = [] + for item in abnormal_items: + if isinstance(item, dict): + processed_items.append(item) + else: + processed_items.append({"name": str(item)}) + abnormal_items = processed_items + + # 处理 risk_assessment + risk_assessment = analysis.get("risk_assessment", "未提供") + if isinstance(risk_assessment, dict): + # 如果是对象,转换为文本 + parts = [] + if risk_assessment.get("high_risk"): + parts.append(f"【高风险】{'; '.join(risk_assessment['high_risk'])}") + if risk_assessment.get("medium_risk"): + parts.append(f"【中风险】{'; '.join(risk_assessment['medium_risk'])}") + if risk_assessment.get("low_risk"): + parts.append(f"【低风险】{'; '.join(risk_assessment['low_risk'])}") + risk_assessment = "\n".join(parts) if parts else "未检测到明确风险" + + # 处理 recommendations + recommendations = analysis.get("recommendations", []) + if recommendations: + recommendations = [ + item.get("recommendation", item.get("text", str(item))) + if isinstance(item, dict) else str(item) + for item in recommendations + ] + + return { + "filename": filename, + "analysis_date": datetime.now().strftime("%Y年%m月%d日"), + "llm_type": llm_type, + "summary": analysis.get("summary", "暂无摘要"), + "key_findings": key_findings, + "abnormal_items": abnormal_items, + "risk_assessment": risk_assessment, + "recommendations": recommendations, + "generation_time": datetime.now().strftime("%Y-%m-%d %H:%M:%S") + } + + def _render_html(self, template_data: Dict[str, Any]) -> str: + """渲染HTML模板""" + template = self.jinja_env.get_template("report_template.html") + return template.render(**template_data) + + def _generate_pdf(self, html_content: str, original_filename: str) -> str: + """将HTML转换为PDF""" + try: + # 生成PDF文件名 + timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") + base_name = Path(original_filename).stem + pdf_filename = f"{base_name}_分析报告_{timestamp}.pdf" + pdf_path = self.output_dir / pdf_filename + + # 尝试使用 WeasyPrint(推荐,质量更好) + try: + from weasyprint import HTML, CSS + HTML(string=html_content).write_pdf( + str(pdf_path), + stylesheets=[CSS(string='@page { size: A4; margin: 1cm; }')] + ) + except ImportError: + # 降级到 xhtml2pdf(更简单,无需额外依赖) + print(" WeasyPrint 未安装,使用 xhtml2pdf 生成PDF...") + from xhtml2pdf import pisa + with open(pdf_path, "wb") as pdf_file: + pisa_status = pisa.CreatePDF(html_content, dest=pdf_file) + if pisa_status.err: + raise Exception("xhtml2pdf 生成失败") + + return str(pdf_path) + + except Exception as e: + raise Exception(f"PDF转换失败: {str(e)}") + + def get_pdf_file(self, pdf_path: str) -> bytes: + """读取PDF文件内容""" + if not os.path.exists(pdf_path): + raise FileNotFoundError("PDF文件不存在") + + with open(pdf_path, "rb") as f: + return f.read() + + def generate_comprehensive_report( + self, + patient_name: str, + template_data: Dict[str, Any] + ) -> str: + """ + 生成综合健康报告(多份报告整合) + + Args: + patient_name: 患者姓名 + template_data: 包含所有报告数据和分析结果的字典 + + Returns: + 生成的PDF文件路径 + """ + try: + # 准备综合报告模板数据 + comprehensive_data = self._prepare_comprehensive_data(patient_name, template_data) + + # 渲染HTML(使用综合报告模板) + html_content = self._render_comprehensive_html(comprehensive_data) + + # 生成PDF + timestamp = datetime.now().strftime("%Y%m%d_%H%M%S") + pdf_filename = f"{patient_name}_综合健康报告_{timestamp}.pdf" + pdf_path = self.output_dir / pdf_filename + + try: + from weasyprint import HTML, CSS + HTML(string=html_content).write_pdf( + str(pdf_path), + stylesheets=[CSS(string='@page { size: A4; margin: 1cm; }')] + ) + except ImportError: + # 如果 WeasyPrint 不可用,使用 xhtml2pdf + from xhtml2pdf import pisa + with open(pdf_path, "wb") as pdf_file: + pisa_status = pisa.CreatePDF(html_content, dest=pdf_file) + if pisa_status.err: + raise Exception("xhtml2pdf 生成失败") + + return str(pdf_path) + + except Exception as e: + raise Exception(f"综合报告生成失败: {str(e)}") + + def _prepare_comprehensive_data( + self, + patient_name: str, + template_data: Dict[str, Any] + ) -> Dict[str, Any]: + """准备综合报告模板数据""" + analysis = template_data.get("analysis", {}) + + # 处理分析结果(与单报告相同的逻辑) + key_findings = analysis.get("key_findings", []) + if key_findings: + key_findings = [ + item.get("finding", item.get("text", str(item))) + if isinstance(item, dict) else str(item) + for item in key_findings + ] + + abnormal_items = analysis.get("abnormal_items", []) + if abnormal_items: + processed_items = [] + for item in abnormal_items: + if isinstance(item, dict): + processed_items.append(item) + else: + processed_items.append({"name": str(item)}) + abnormal_items = processed_items + + risk_assessment = analysis.get("risk_assessment", "未提供") + if isinstance(risk_assessment, dict): + parts = [] + if risk_assessment.get("high_risk"): + parts.append(f"【高风险】{'; '.join(risk_assessment['high_risk'])}") + if risk_assessment.get("medium_risk"): + parts.append(f"【中风险】{'; '.join(risk_assessment['medium_risk'])}") + if risk_assessment.get("low_risk"): + parts.append(f"【低风险】{'; '.join(risk_assessment['low_risk'])}") + risk_assessment = "\n".join(parts) if parts else "未检测到明确风险" + + recommendations = analysis.get("recommendations", []) + if recommendations: + recommendations = [ + item.get("recommendation", item.get("text", str(item))) + if isinstance(item, dict) else str(item) + for item in recommendations + ] + + return { + "patient_name": patient_name, + "report_count": template_data.get("report_count", 0), + "report_list": template_data.get("report_list", []), + "generation_date": template_data.get("generation_date", datetime.now().strftime("%Y年%m月%d日")), + "summary": analysis.get("summary", "暂无摘要"), + "key_findings": key_findings, + "abnormal_items": abnormal_items, + "risk_assessment": risk_assessment, + "recommendations": recommendations, + "generation_time": datetime.now().strftime("%Y-%m-%d %H:%M:%S") + } + + def _render_comprehensive_html(self, template_data: Dict[str, Any]) -> str: + """渲染综合报告HTML模板""" + template = self.jinja_env.get_template("comprehensive_report_template.html") + return template.render(**template_data) diff --git a/backend/services/report_integrator.py b/backend/services/report_integrator.py new file mode 100644 index 0000000..5532249 --- /dev/null +++ b/backend/services/report_integrator.py @@ -0,0 +1,106 @@ +from typing import List, Dict, Any +import json + +class ReportIntegrator: + """医疗报告整合分析器""" + + def __init__(self, llm_service): + self.llm_service = llm_service + + def integrate_reports(self, reports: List[Dict[str, Any]]) -> Dict[str, Any]: + """整合多份医疗报告""" + + if len(reports) == 1: + return self._single_report_summary(reports[0]) + + # 构建整合分析的提示词 + prompt = self._build_integration_prompt(reports) + + # 调用LLM进行整合分析 + if self.llm_service.llm_type == "openai": + result = self._call_openai_integration(prompt) + elif self.llm_service.llm_type == "ollama": + result = self._call_ollama_integration(prompt) + else: + result = self._mock_integration(reports) + + # 添加报告列表 + result["reports_included"] = [ + {"filename": report["filename"], "summary": report["analysis"].get("summary", "无摘要")} + for report in reports + ] + + return result + + def _build_integration_prompt(self, reports: List[Dict[str, Any]]) -> str: + """构建整合分析提示词""" + report_details = [] + for i, report in enumerate(reports, 1): + analysis = report["analysis"] + report_details.append(f"【报告{i}: {report['filename']}】\n摘要: {analysis.get('summary', '无')}") + + prompt = f"""你是专业医疗分析专家。整合以下{len(reports)}份报告,提供综合评估。 +{chr(10).join(report_details)} + +请以JSON格式返回: +{{"overall_summary": "整体摘要", "health_trends": ["趋势"], "priority_concerns": [{{"concern": "关注点", "severity": "低/中/高", "description": "描述"}}], "comprehensive_assessment": "综合评估", "integrated_recommendations": ["建议"], "follow_up_suggestions": ["后续建议"]}}""" + return prompt + + def _call_openai_integration(self, prompt: str) -> Dict[str, Any]: + """调用OpenAI进行整合分析""" + try: + response = self.llm_service.client.chat.completions.create( + model=self.llm_service.model, + messages=[{"role": "system", "content": "你是医疗分析专家。"}, {"role": "user", "content": prompt}], + temperature=0.7, max_tokens=3000 + ) + content = response.choices[0].message.content + return self.llm_service._parse_llm_response(content) + except Exception as e: + return self._create_error_result(f"OpenAI分析失败: {str(e)}") + + def _call_ollama_integration(self, prompt: str) -> Dict[str, Any]: + """调用Ollama进行整合分析""" + try: + import requests + response = requests.post(f"{self.llm_service.ollama_host}/api/generate", + json={"model": self.llm_service.model, "prompt": prompt, "stream": False}, timeout=90) + if response.status_code == 200: + return self.llm_service._parse_llm_response(response.json().get("response", "")) + raise Exception(f"Ollama错误: {response.status_code}") + except Exception as e: + return self._create_error_result(f"Ollama分析失败: {str(e)}") + + def _mock_integration(self, reports: List[Dict[str, Any]]) -> Dict[str, Any]: + """模拟整合分析结果""" + total_abnormal = sum(len(report["analysis"].get("abnormal_items", [])) for report in reports) + return { + "overall_summary": f"综合分析了{len(reports)}份报告,发现{total_abnormal}项异常指标。整体健康状况良好。", + "health_trends": ["各项指标整体稳定", "未发现明显恶化趋势", "建议持续监测"], + "priority_concerns": [{"concern": "定期体检", "severity": "低", "description": "建议保持定期体检"}] if total_abnormal == 0 else [{"concern": "异常指标", "severity": "中", "description": f"发现{total_abnormal}项异常"}], + "comprehensive_assessment": "整体健康状况可控,建议关注生活方式、定期复查。", + "integrated_recommendations": ["保持均衡饮食", "坚持适量运动", "保证充足睡眠", "定期体检"], + "follow_up_suggestions": ["3-6个月后复查关键指标", "如有不适及时就医", "保持健康记录"], + "note": "这是模拟结果。实际使用请配置OpenAI或Ollama。" + } + + def _single_report_summary(self, report: Dict[str, Any]) -> Dict[str, Any]: + """单个报告摘要""" + analysis = report["analysis"] + return { + "overall_summary": f"单份报告分析:{analysis.get('summary', '无摘要')}", + "reports_included": [{"filename": report["filename"], "summary": analysis.get("summary", "无")}], + "health_trends": analysis.get("key_findings", []), + "priority_concerns": [{"concern": item, "severity": "中", "description": "需关注"} for item in analysis.get("abnormal_items", [])[:3]], + "comprehensive_assessment": analysis.get("risk_assessment", "请查看详细分析"), + "integrated_recommendations": analysis.get("recommendations", []), + "follow_up_suggestions": ["定期复查", "咨询医生"] + } + + def _create_error_result(self, error_msg: str) -> Dict[str, Any]: + """创建错误结果""" + return { + "error": error_msg, "overall_summary": "分析失败", "health_trends": [], + "priority_concerns": [], "comprehensive_assessment": "无法完成分析", + "integrated_recommendations": [], "follow_up_suggestions": [] + } diff --git a/backend/services/template_service.py b/backend/services/template_service.py new file mode 100644 index 0000000..17d24c1 --- /dev/null +++ b/backend/services/template_service.py @@ -0,0 +1,67 @@ +import os +import shutil +from pathlib import Path +from typing import Optional + +class TemplateService: + """PDF模板管理服务""" + + def __init__(self, template_dir: str = "templates/pdf"): + self.template_dir = Path(template_dir) + self.template_dir.mkdir(parents=True, exist_ok=True) + + # 默认模板名称 + self.default_template = "be_u_template.pdf" + + def get_template_path(self, template_name: Optional[str] = None) -> Path: + """获取模板文件路径""" + if template_name is None: + template_name = self.default_template + + template_path = self.template_dir / template_name + + if not template_path.exists(): + raise FileNotFoundError(f"模板文件不存在: {template_path}") + + return template_path + + def save_template(self, source_path: str, template_name: Optional[str] = None) -> str: + """ + 保存模板文件到系统 + + Args: + source_path: 源文件路径 + template_name: 模板名称(可选) + + Returns: + 保存后的模板路径 + """ + source_path = Path(source_path) + + if not source_path.exists(): + raise FileNotFoundError(f"源文件不存在: {source_path}") + + if template_name is None: + template_name = self.default_template + + dest_path = self.template_dir / template_name + + # 复制文件 + shutil.copy2(source_path, dest_path) + + print(f"✓ 模板已保存: {dest_path}") + return str(dest_path) + + def template_exists(self, template_name: Optional[str] = None) -> bool: + """检查模板是否存在""" + if template_name is None: + template_name = self.default_template + + return (self.template_dir / template_name).exists() + + def list_templates(self) -> list: + """列出所有可用模板""" + if not self.template_dir.exists(): + return [] + + return [f.name for f in self.template_dir.glob("*.pdf")] diff --git a/backend/template_complete.docx b/backend/template_complete.docx new file mode 100644 index 0000000..ff08905 Binary files /dev/null and b/backend/template_complete.docx differ diff --git a/backend/template_explanations.json b/backend/template_explanations.json new file mode 100644 index 0000000..46e2efa --- /dev/null +++ b/backend/template_explanations.json @@ -0,0 +1,438 @@ +{ + "COLOR": { + "clinical_en": "Reflects the appearance and properties of urine, assisting in judging the health status of the urinary system and the body as a whole. Normal urine is pale yellow to dark yellow due to containing urobilin and other substances, and is affected by water intake, food, drugs, etc.", + "clinical_cn": "反映尿液的外观性状,辅助判断泌尿系统及身体整体健康状态,正常尿液因含尿色素等呈淡黄色至深黄色,受饮水、食物、药物等影响。" + }, + "PH": { + "clinical_en": "Reflects the acidity and alkalinity of urine, affected by diet, diseases, drugs, etc., helping to judge acid - base balance, urinary system diseases, and the impact of drugs. The normal urine pH is approximately 4.5 - 8.0, and the fluctuation is related to diet. For example, a diet rich in meat makes it more acidic, and a diet rich in vegetables makes it more alkaline.", + "clinical_cn": "反映尿液的酸碱性,受饮食、疾病、药物等影响,帮助判断酸碱平衡、泌尿系统疾病及药物影响等情况。正常尿液pH约 4.5 - 8.0,波动与饮食有关,如食肉多偏酸,食蔬菜多偏碱。" + }, + "TUR": { + "clinical_en": "Reflects the clarity and transparency of urine, assisting in judging abnormal components in the urine. Normal urine is clear and transparent, and abnormal turbidity is related to the presence of cells, crystals, bacteria, etc. in the urine.", + "clinical_cn": "体现尿液的清澈透明程度,辅助判断尿中成分异常,正常尿液清晰透明,浊度异常与尿中含细胞、结晶、细菌等有关。" + }, + "PRO": { + "clinical_en": "Detects the protein content in urine, which is an important indicator for diagnosing kidney diseases and evaluating kidney damage. Normal urine protein is negative or in trace amounts.", + "clinical_cn": "检测尿中蛋白质含量,是诊断肾脏疾病、评估肾脏损伤的重要指标,正常尿蛋白阴性或极微量。" + }, + "BLD/ERY": { + "clinical_en": "Judges whether there are red blood cells and the destruction of red blood cells in the urine, assisting in the diagnosis of urinary system and systemic diseases. There are very few red blood cells in normal urine, and occult blood is negative.", + "clinical_cn": "判断尿中是否有红细胞及红细胞破坏情况,辅助诊断泌尿系统及全身疾病,正常尿中红细胞极少,隐血阴性。" + }, + "GLU": { + "clinical_en": "Reflects the glucose content in urine, assisting in the diagnosis of diabetes and other glucose metabolism disorders. Normal urine glucose is negative.", + "clinical_cn": "反映尿中葡萄糖含量,辅助诊断糖尿病及其他糖代谢异常疾病,正常尿糖阴性。" + }, + "SG": { + "clinical_en": "Reflects the concentration of solutes in urine, reflecting the kidney's concentrating and diluting functions. The normal reference value is approximately 1.003 - 1.030, affected by water intake, sweating, etc.", + "clinical_cn": "体现尿液中溶质浓度,反映肾脏浓缩和稀释功能,正常参考值约 1.003 - 1.030,受饮水、出汗等影响。" + }, + "WBC": { + "clinical_en": "Reflects the acidity and alkalinity of urine, affected by diet, diseases, drugs, etc., helping to judge acid - base balance, urinary system diseases, and the impact of drugs. The normal urine pH is approximately 4.5 - 8.0, and the fluctuation is related to diet. For example, a diet rich in meat makes it more acidic, and a diet rich in vegetables makes it more alkaline.", + "clinical_cn": "反映尿液的酸碱性,受饮食、疾病、药物等影响,帮助判断酸碱平衡、泌尿系统疾病及药物影响等情况。正常尿液pH约 4.5 - 8.0,波动与饮食有关,如食肉多偏酸,食蔬菜多偏碱。" + }, + "NIT": { + "clinical_en": "Assists in the diagnosis of urinary system infections, because some bacteria can reduce nitrates in the urine to nitrites. It is normally negative.", + "clinical_cn": "辅助诊断泌尿系统感染,因某些细菌可将尿中硝酸盐还原为亚硝酸盐,正常阴性。" + }, + "KET": { + "clinical_en": "Reflects the state of fat decomposition and metabolism in the body, assisting in the diagnosis of diseases such as diabetic ketoacidosis. Normal urine ketone bodies are negative.", + "clinical_cn": "反映体内脂肪分解代谢情况,辅助诊断糖尿病酮症酸中毒等疾病,正常尿酮体阴性。" + }, + "RBC COUNT": { + "clinical_en": "Reflects the total number of red blood cells, is a basic indicator for judging blood diseases such as anemia and erythrocytosis, and is also used to evaluate oxygen - carrying capacity. The normal range for adult males is (4.0 - 5.5) × 10¹²/L, and for adult females is (3.5 - 5.0) × 10¹²/L. 2.", + "clinical_cn": "反映红细胞的总体数量,是判断贫血、红细胞增多症等血液疾病的基础指标,也用于评估携氧能力。正常成年男性(4.0-5.5)×10¹²/L,女性(3.5-5.0)×10¹²/L。" + }, + "HB": { + "clinical_en": "A key indicator for evaluating anemia and oxygen - carrying capacity, which, together with the number of red blood cells, reflects the oxygen - carrying state of the blood. The normal range for adult males is 120 - 160g/L, and for adult females is 110 - 150g/L. 2.", + "clinical_cn": "评估贫血及携氧能力的关键指标,与红细胞数量协同反映血液携氧状态,正常成年男性120-160g/L,女性110-150g/L。" + }, + "HCT": { + "clinical_en": "Refers to the percentage of volume occupied by red blood cells in whole blood, reflecting the concentration of red blood cells, assisting in the diagnosis of anemia and erythrocytosis, and also used in the evaluation of dehydration and other conditions. The normal range for adult males is 40% - 50%, and for adult females is 37% - 48%.", + "clinical_cn": "指红细胞在全血中所占体积百分比,反映红细胞的浓缩程度,辅助诊断贫血、红细胞增多症,也用于脱水等情况评估,正常成年男性40%-50%,女性37%-48%。" + }, + "MCV": { + "clinical_en": "Reflects the average size of red blood cells, helping to classify anemia, such as macrocytic, normocytic, and microcytic anemia. The normal range is 80 - 100fL.", + "clinical_cn": "反映红细胞的平均大小,帮助贫血分类,如大细胞性、正细胞性、小细胞性贫血,正常80-100fL。" + }, + "MCH": { + "clinical_en": "Indicates the average amount of hemoglobin contained in each red blood cell, assisting in the diagnosis and classification of anemia. The normal range is 27 - 34pg.", + "clinical_cn": "表示每个红细胞内平均所含血红蛋白的量,辅助贫血诊断与分类,正常27-34pg。" + }, + "MCHC": { + "clinical_en": "Reflects the concentration of hemoglobin in red blood cells, used for anemia classification. The normal range is 320 - 360g/L.", + "clinical_cn": "反映红细胞内血红蛋白的浓度,用于贫血分类,正常320-360g/L。" + }, + "RDW": { + "clinical_en": "Reflects the heterogeneity of red blood cell volume, that is, the degree of dispersion of red blood cell volume, helping in the differential diagnosis of anemia. The normal RDW - CV (coefficient of variation) is 11.5% - 14.5%.", + "clinical_cn": "反映红细胞体积大小的异质性,即红细胞体积的离散程度,帮助贫血的鉴别诊断,正常RDW-CV(变异系数)11.5%-14.5%。" + }, + "RBC MORPHOLOGY": { + "clinical_en": "Observing the morphological characteristics of red blood cells (such as normal, spherical, elliptical, sickle - shaped, etc.) is of great value in the diagnosis of the etiology of anemia and hematological diseases. Normal red blood cells are biconcave discs.", + "clinical_cn": "观察红细胞的形态特点(如正常、球形、椭圆形、镰状等),对贫血病因、血液系统疾病诊断有重要价值,正常红细胞为双凹圆盘状。" + }, + "WBC COUNT": { + "clinical_en": "Reflects the total number of white blood cells, is a preliminary indicator for judging infections, inflammation, and certain blood diseases. The normal range for adults is (3.5 - 9.5) × 10⁹/L. 2.", + "clinical_cn": "反映白细胞的总体数量,是判断感染、炎症及某些血液疾病的初步指标,正常成人(3.5-9.5)×10⁹/L。" + }, + "NEUT%": { + "clinical_en": "Neutrophils are the main component of white blood cells and play a key role in anti - infection. Changes in their count and percentage reflect infection, inflammation, and hematopoietic conditions. The normal count is (2.0 - 7.5) × 10⁹/L, and the percentage is 50% - 70%.", + "clinical_cn": "中性粒细胞是白细胞主要成分,在抗感染中起关键作用,数量和百分比变化反映感染、炎症及造血情况,正常数量(2.0-7.5)×10⁹/L,百分比50%-70%。" + }, + "EOS%": { + "clinical_en": "Participates in immune processes such as allergic reactions and anti - parasitic infections. Changes in their count and percentage are significant for the diagnosis of allergic diseases, parasitic infections, etc. The normal count is (0.02 - 0.52) × 10⁹/L, and the percentage is 0.4% - 8%.", + "clinical_cn": "参与过敏反应、抗寄生虫感染等免疫过程,数量和百分比变化对过敏疾病、寄生虫感染等诊断有意义,正常数量(0.02-0.52)×10⁹/L,百分比0.4%-8%。" + }, + "BAS%": { + "clinical_en": "Participates in allergic reactions and releases bioactive substances (such as histamine). Although their number is small, they are valuable for the diagnosis of allergies and certain blood diseases. The normal count is (0 - 0.06) × 10⁹/L, and the percentage is 0% - 1%.", + "clinical_cn": "参与过敏反应,释放生物活性物质(如组胺),数量少但对过敏、某些血液疾病诊断有价值,正常数量(0-0.06)×10⁹/L,百分比0%-1%。" + }, + "LYMPH%": { + "clinical_en": "Lymphocytes are an important part of immune cells and participate in specific immunity (such as humoral immunity and cellular immunity). Changes in their count and percentage reflect immune status, infections, and blood diseases, etc. The normal count is (1.1 - 3.2) × 10⁹/L, and the percentage is 20% - 50%.", + "clinical_cn": "淋巴细胞是免疫细胞重要组成,参与特异性免疫(如体液免疫、细胞免疫),数量和百分比变化反映免疫状态、感染及血液疾病等,正常数量(1.1-3.2)×10⁹/L,百分比20%-50%。" + }, + "MONO%": { + "clinical_en": "Monocytes are precursors of macrophages and play a role in anti - infection and immune regulation, participating in the clearance of pathogens, foreign bodies, etc. The normal count is (0.1 - 0.6) × 10⁹/L, and the percentage is 3% - 10%.", + "clinical_cn": "单核细胞是巨噬细胞的前体,在抗感染、免疫调节中起作用,参与清除病原体、异物等,正常数量(0.1-0.6)×10⁹/L,百分比3%-10%。" + }, + "PLT": { + "clinical_en": "Reflects the number of platelets, evaluates hemostatic function, and is important for the diagnosis of hemorrhagic diseases, thrombotic diseases, and treatment monitoring. The normal range is (125 - 350) × 10⁹/L.", + "clinical_cn": "反映血小板数量,评估止血功能,对出血性疾病、血栓性疾病诊断及治疗监测重要,正常(125-350)×10⁹/L。" + }, + "PCT": { + "clinical_en": "Refers to the percentage of volume occupied by platelets in whole blood, related to the number and volume of platelets, assisting in evaluating platelet function and hematopoietic conditions. The normal range is 0.11% - 0.28%.", + "clinical_cn": "指血小板在全血中所占体积百分比,与血小板数量、体积有关,辅助评估血小板功能、造血情况,正常0.11%-0.28%。" + }, + "MPV": { + "clinical_en": "Reflects the average volume of platelets, which has reference value for platelet function and disease diagnosis. The normal range is 7 - 11fL.", + "clinical_cn": "反映血小板的平均体积大小,对血小板功能、疾病诊断有参考价值,正常7-11fL。" + }, + "PDW": { + "clinical_en": "Reflects the degree of dispersion of platelet volume, indicating the uniformity of platelet volume, assisting in judging platelet abnormalities. The normal PDW is 15% - 17%.", + "clinical_cn": "反映血小板体积大小的离散程度,说明血小板体积的均一性,辅助判断血小板异常,正常PDW15%-17%。" + }, + "FBS": { + "clinical_en": "A basic indicator for diagnosing diabetes and evaluating blood sugar control. Normal reference value: 3.9 - 6.1mmol/L.", + "clinical_cn": "诊断糖尿病、评估血糖控制的基础指标,正常参考值3.9-6.1mmol/L。" + }, + "HBA1C": { + "clinical_en": "Reflects the average blood sugar level in the past 2 - 3 months, evaluating the control effect of diabetes. Normal reference value: 4% - 6%.", + "clinical_cn": "反映过去2-3个月平均血糖水平,评估糖尿病控制效果,正常参考值4%-6%。" + }, + "TC": { + "clinical_en": "Reflects the overall level of cholesterol in the blood, evaluates the risk of atherosclerosis and cardiovascular diseases. Normal reference value: < 5.2mmol/L (appropriate level), 5.2 - 6.2mmol/L is borderline elevation, > 6.2mmol/L is elevation.", + "clinical_cn": "反映血液中胆固醇总体水平,评估动脉粥样硬化、心血管疾病风险,正常参考值<5.2mmol/L(合适水平),5.2-6.2mmol/L为边缘升高,>6.2mmol/L为升高。" + }, + "TG": { + "clinical_en": "Evaluates dyslipidemia and pancreatitis risk. Normal reference value: < 1.7mmol/L, 1.7 - 2.3mmol/L is borderline elevation, > 2.3mmol/L is elevation.", + "clinical_cn": "评估血脂异常、胰腺炎风险,正常参考值<1.7mmol/L,1.7-2.3mmol/L为边缘升高,>2.3mmol/L为升高。" + }, + "LP(A)": { + "clinical_en": "An independent risk factor for cardiovascular diseases, related to atherosclerosis and thrombosis. Normal reference value: < 300mg/L.", + "clinical_cn": "独立的心血管疾病危险因素,与动脉粥样硬化、血栓形成相关,正常参考值<300mg/L。" + }, + "HDL": { + "clinical_en": "\"Good cholesterol\", promotes reverse cholesterol transport, reduces the risk of cardiovascular diseases. Normal reference value: > 1.0mmol/L (males), > 1.3mmol/L (females).", + "clinical_cn": "“好胆固醇”,促进胆固醇逆向转运,降低心血管疾病风险,正常参考值>1.0mmol/L(男性)、>1.3mmol/L(女性)。" + }, + "LDL": { + "clinical_en": "\"Bad cholesterol\", a core factor causing atherosclerosis, a key indicator for evaluating the risk of cardiovascular diseases. Normal reference value: < 3.4mmol/L (appropriate level), 3.4 - 4.1mmol/L is borderline elevation, > 4.1mmol/L is elevation.", + "clinical_cn": "“坏胆固醇”,致动脉粥样硬化核心因素,评估心血管疾病风险关键指标,正常参考值<3.4mmol/L(合适水平),3.4-4.1mmol/L为边缘升高,>4.1mmol/L为升高。" + }, + "BLOOD TYPE": { + "clinical_en": "Used for blood type identification to ensure blood transfusion safety (blood type matching is required during blood transfusion), and also used in fields such as organ transplantation and forensic medicine.", + "clinical_cn": "用于血型鉴定,保障输血安全(输血时需血型匹配),也在器官移植、法医学等领域有应用。" + }, + "BLOOD TYPE RH": { + "clinical_en": "The RH blood type system is important. Most people are RH positive. RH negative people (\"panda blood\") need special attention during blood transfusion, and it is also related to neonatal hemolytic disease (which may occur when the mother is RH negative and the fetus is RH positive).", + "clinical_cn": "RH血型系统重要,大部分人RH阳性,RH阴性者(熊猫血)输血时需特殊注意,也与新生儿溶血病(母亲RH阴性,胎儿RH阳性时可能发生)相关。" + }, + "PT": { + "clinical_en": "Reflects the function of the extrinsic coagulation pathway and the common coagulation pathway, used to evaluate vitamin K deficiency, liver disease, oral anticoagulants (such as warfarin), etc. The normal reference value varies depending on the detection method, generally 11 - 13 seconds, and more than 3 seconds longer than the normal control has clinical significance.", + "clinical_cn": "反映外源性凝血途径及共同凝血途径的功能,用于评估维生素K缺乏、肝病、口服抗凝剂(如华法林)等情况,正常参考值因检测方法而异,一般11-13秒,超过正常对照3秒有临床意义。" + }, + "APTT": { + "clinical_en": "Reflects the function of the intrinsic coagulation pathway and the common coagulation pathway, used to diagnose hemophilia (deficiency of intrinsic coagulation factors), evaluate liver disease, monitor heparin therapy, etc. The normal range is 30 - 45 seconds, and more than 10 seconds longer than the normal control is significant.", + "clinical_cn": "反映内源性凝血途径及共同凝血途径的功能,用于诊断血友病(内源性凝血因子缺乏)、评估肝病、监测肝素治疗等,正常30-45秒,超过正常对照10秒有意义。" + }, + "TT": { + "clinical_en": "Reflects the process of converting fibrinogen into fibrin, affected by fibrinogen content, structure, and anticoagulants in the blood. The normal range is 16 - 18 seconds, and more than 3 seconds longer than the normal control is significant.", + "clinical_cn": "反映纤维蛋白原转化为纤维蛋白的过程,受纤维蛋白原含量、结构及血液中抗凝物质影响,正常16-18秒,超过正常对照3秒有意义。" + }, + "FIB": { + "clinical_en": "It is a cross - linked fibrin degradation product, reflecting the body's coagulation and fibrinolytic activity, used to diagnose thrombotic diseases such as DIC, deep vein thrombosis, and pulmonary embolism. The normal reference value is generally < 0.5mg/L (ELISA method), and there are differences in different detection methods.", + "clinical_cn": "是交联纤维蛋白降解产物,反映体内凝血和纤溶活性,用于诊断DIC、深静脉血栓、肺栓塞等血栓性疾病,正常参考值一般<0.5mg/L(ELISA法),不同检测方法有差异。" + }, + "HIV": { + "clinical_en": "Used for the screening and diagnosis of AIDS (AIDS), to determine whether HIV is infected. It is normally negative.", + "clinical_cn": "用于艾滋病(AIDS)的筛查、诊断,判断是否感染HIV,正常为阴性。" + }, + "TRUST": { + "clinical_en": "It is a non - specific antibody test for syphilis, used for syphilis screening and curative effect observation. It is normally negative.", + "clinical_cn": "是梅毒非特异性抗体检测,用于梅毒筛查、疗效观察,正常为阴性。" + }, + "TPPA": { + "clinical_en": "A test for specific antibodies against Treponema pallidum, used for the diagnosis of syphilis. After syphilis infection, specific antibodies are generally positive for life. It is normally negative.", + "clinical_cn": "梅毒螺旋体特异性抗体检测,用于梅毒确诊,感染梅毒后,特异性抗体一般终身阳性,正常为阴性。" + }, + "HBSAG": { + "clinical_en": "It is a marker of hepatitis B virus infection, used for hepatitis B screening and diagnosis. It is normally negative.", + "clinical_cn": "是乙肝病毒感染的标志,用于乙肝筛查、诊断,正常为阴性。" + }, + "HBSAB": { + "clinical_en": "It is a protective antibody against hepatitis B, indicating immunity to hepatitis B virus. It is normally negative or has a protective titer (generally, ≥10mIU/ml is considered to have protective power).", + "clinical_cn": "是乙肝保护性抗体,提示对乙肝病毒有免疫力,正常为阴性或有保护性滴度(一般认为≥10mIU/ml有保护力)。" + }, + "HBEAG": { + "clinical_en": "Reflects the degree of hepatitis B virus replication and the strength of infectivity. It is normally negative.", + "clinical_cn": "反映乙肝病毒复制活跃程度、传染性强弱,正常为阴性。" + }, + "HBEAB": { + "clinical_en": "Indicates that hepatitis B virus replication is inhibited and infectivity is reduced. It is normally negative.", + "clinical_cn": "表示乙肝病毒复制受抑制,传染性降低,正常为阴性。" + }, + "HBCAB": { + "clinical_en": "It is a marker of hepatitis B virus infection. As long as the hepatitis B virus has been infected (whether current infection or past infection), the core antibody is mostly positive. It is normally negative.", + "clinical_cn": "是乙肝病毒感染的标志,只要感染过乙肝病毒(无论现症感染还是既往感染),核心抗体多为阳性,正常为阴性。" + }, + "HCV-IGM": { + "clinical_en": "Used for the diagnosis and condition monitoring of acute hepatitis C virus (HCV) infection. It is normally negative.", + "clinical_cn": "用于丙肝病毒(HCV)急性感染的诊断、病情监测,正常为阴性。" + }, + "KALIUM": { + "clinical_en": "Maintains cell physiological functions (such as osmotic pressure, acid - base balance, neuromuscular excitability, etc.), and is particularly important for heart function. The normal serum potassium is 3.5 - 5.5mmol/L.", + "clinical_cn": "维持细胞生理功能(如渗透压、酸碱平衡、神经肌肉兴奋性等),对心脏功能尤其重要,正常血清钾3.5-5.5mmol/L。" + }, + "SODIUM": { + "clinical_en": "Maintains extracellular fluid osmotic pressure, acid - base balance, participates in nerve conduction, maintains blood volume, etc. The normal serum sodium is 135 - 145mmol/L.", + "clinical_cn": "维持细胞外液渗透压、酸碱平衡,参与神经传导、维持血容量等,正常血清钠135-145mmol/L。" + }, + "CHLORIDE": { + "clinical_en": "Maintains body fluid osmotic pressure, acid - base balance, participates in gastric acid formation, etc. The normal serum chloride is 96 - 108mmol/L.", + "clinical_cn": "维持体液渗透压、酸碱平衡,参与胃酸形成等,正常血清氯96-108mmol/L。" + }, + "CALCIUM": { + "clinical_en": "Maintains bone hardness, neuromuscular excitability, coagulation function, etc. The normal total serum calcium is 2.25 - 2.58mmol/L, and ionized calcium is 1.10 - 1.34mmol/L.", + "clinical_cn": "维持骨骼硬度、神经肌肉兴奋性、凝血功能等,正常血清总钙2.25-2.58mmol/L,离子钙1.10-1.34mmol/L。" + }, + "MAGNESIUM": { + "clinical_en": "Participates in the regulation of various enzyme activities, maintains neuromuscular excitability, heart function, etc. The normal serum magnesium is 0.75 - 1.02mmol/L.", + "clinical_cn": "参与多种酶活性调节,维持神经肌肉兴奋性、心脏功能等,正常血清镁0.75-1.02mmol/L。" + }, + "PHOSPHORUS": { + "clinical_en": "Participates in bone formation, energy metabolism, nucleic acid synthesis, etc. The normal serum phosphorus is 0.97 - 1.61mmol/L.", + "clinical_cn": "参与骨骼形成、能量代谢、核酸合成等,正常血清磷0.97-1.61mmol/L。" + }, + "TP": { + "clinical_en": "Reflects liver synthetic function and body protein reserves, assisting in the diagnosis of liver and kidney diseases and nutritional status. Normal reference value: 60 - 80g/L (there may be differences in different detection methods).", + "clinical_cn": "反映肝脏合成功能及体内蛋白储备,协助诊断肝肾疾病、营养状态。正常参考值:60-80g/L(不同检测方法可能有差异)。" + }, + "A": { + "clinical_en": "The main protein synthesized by the liver, reflecting liver synthetic capacity and nutritional status, and maintaining plasma colloid osmotic pressure. Normal reference value: 40 - 55g/L.", + "clinical_cn": "肝脏合成的主要蛋白,体现肝脏合成能力、营养状态,维持血浆胶体渗透压。正常参考值:40-55g/L。" + }, + "G": { + "clinical_en": "Participates in immunity, reflects the body's immune status, and together with albumin, judges liver diseases and immune diseases. Normal reference value: 20 - 30g/L.", + "clinical_cn": "参与免疫,反映机体免疫状态,与白蛋白协同判断肝脏疾病、免疫疾病。正常参考值:20-30g/L。" + }, + "A/G": { + "clinical_en": "Evaluates liver function and protein metabolism balance. The normal ratio is 1.5 - 2.5:1.", + "clinical_cn": "评估肝脏功能、蛋白代谢平衡,正常比值1.5-2.5:1。" + }, + "TBIL": { + "clinical_en": "Reflects hepatocyte damage and bile excretion, assisting in the diagnosis of jaundice types (hemolytic, hepatocellular, obstructive). Normal reference value: 3.4 - 17.1μmol/L.", + "clinical_cn": "反映肝细胞损伤、胆汁排泄情况,协助诊断黄疸类型(溶血性、肝细胞性、梗阻性)。正常参考值:3.4-17.1μmol/L。" + }, + "DBIL": { + "clinical_en": "Also known as conjugated bilirubin, reflects hepatocyte processing of bilirubin and biliary excretion function, assisting in jaundice identification. Normal reference value: 0 - 6.8μmol/L.", + "clinical_cn": "又称结合胆红素,反映肝细胞处理胆红素及胆道排泄功能,辅助黄疸鉴别。正常参考值:0-6.8μmol/L。" + }, + "IBIL": { + "clinical_en": "Also known as unconjugated bilirubin, reflects red blood cell destruction and the liver's initial metabolic capacity, assisting in jaundice identification. Normal reference value: 3.4 - 10.2μmol/L (calculated by TBi - DBi).", + "clinical_cn": "又称非结合胆红素,反映红细胞破坏及肝脏初步代谢能力,辅助黄疸鉴别。正常参考值:3.4-10.2μmol/L(TBi-DBi计算得)。" + }, + "ALP": { + "clinical_en": "Reflects hepatocyte damage and biliary obstruction, and is also used to diagnose bone metabolic diseases (such as rickets, bone tumors). Normal reference value: 45 - 125U/L (adults); children and adolescents may have higher values due to bone development.", + "clinical_cn": "反映肝细胞损伤、胆道梗阻,也用于诊断骨代谢疾病(如佝偻病、骨肿瘤)。正常参考值:45-125U/L(成人);儿童、青少年因骨骼发育,数值可更高。" + }, + "ALT": { + "clinical_en": "A sensitive indicator of hepatocyte damage, which increases in the early stage of hepatitis, drug - induced liver injury, etc. Normal reference value: 7 - 40U/L (there may be differences in different reagents).", + "clinical_cn": "肝细胞损伤敏感指标,肝炎、药物肝损伤等早期即升高。正常参考值:7-40U/L(不同试剂有差异)。" + }, + "AST": { + "clinical_en": "An indicator of myocardial, liver, and skeletal muscle damage. It increases in myocardial infarction, and can also be abnormal in liver diseases and muscle diseases. Normal reference value: 13 - 35U/L.", + "clinical_cn": "心肌、肝脏、骨骼肌损伤指标,心肌梗死时升高,肝病、肌肉疾病也可异常。正常参考值:13-35U/L。" + }, + "GGT": { + "clinical_en": "Reflects hepatocyte damage and biliary obstruction, and is of auxiliary value in the diagnosis of alcoholic liver disease and liver cancer. Normal reference value: 7 - 45U/L (adults).", + "clinical_cn": "反映肝细胞损伤、胆道梗阻,对酒精性肝病、肝癌诊断有辅助价值。正常参考值:7-45U/L(成人)。" + }, + "SCR": { + "clinical_en": "Reflects glomerular filtration function, evaluates the degree of renal function damage (one of the bases for chronic renal failure staging). Normal reference value: 53 - 106μmol/L for males, 44 - 97μmol/L for females (due to differences in muscle mass).", + "clinical_cn": "反映肾小球滤过功能,评估肾功能损伤程度(慢性肾衰分期依据之一)。正常参考值:男性53-106μmol/L,女性44-97μmol/L(因肌肉量有差异)。" + }, + "BUN": { + "clinical_en": "Reflects glomerular filtration function and protein metabolism status (greatly affected by diet and extrarenal factors). Normal reference value: 3.2 - 7.1mmol/L.", + "clinical_cn": "反映肾小球滤过功能、蛋白质代谢状态(受饮食、肾外因素影响大)。正常参考值:3.2-7.1mmol/L。" + }, + "UA": { + "clinical_en": "Reflects the end product of purine metabolism and renal excretion. Elevated levels (hyperuricemia) are associated with gout, kidney stones, metabolic syndrome, hypertension, and cardiovascular disease.", + "clinical_cn": "反映嘌呤代谢终产物及肾脏排泄功能。升高(高尿酸血症)常与痛风、肾结石、代谢综合征、高血压及心血管疾病相关。" + }, + "CK": { + "clinical_en": "A specific indicator of myocardial and skeletal muscle damage, early diagnosis of myocardial infarction (increases 4 - 6h, peaks 1 - 2 days), and also used to diagnose myopathy (such as myositis, progressive muscular dystrophy). Normal reference value: 50 - 310U/L for males, 40 - 200U/L for females.", + "clinical_cn": "心肌、骨骼肌损伤特异性指标,心肌梗死早期诊断(4-6h升高,1-2天达峰),也用于诊断肌病(如肌炎、进行性肌营养不良)。正常参考值:男性50-310U/L,女性40-200U/L。" + }, + "LDH": { + "clinical_en": "An indicator of myocardial, liver, skeletal muscle, and hematological system diseases. It can increase in myocardial infarction (increases 12 - 24h, recovers 1 - 2 weeks), liver diseases, and hemolytic anemia. Normal reference value: 120 - 250U/L.", + "clinical_cn": "心肌、肝脏、骨骼肌、血液系统疾病指标,心肌梗死(12-24h升高,1-2周恢复)、肝病、溶血性贫血均可升高。正常参考值:120-250U/L。" + }, + "CK-MB": { + "clinical_en": "A specific indicator of myocardial damage, early diagnosis of myocardial infarction (increases 3 - 4h, peaks 1 - 2 days), better than total CK. Normal reference value: 0 - 25U/L (activity method); < 4ng/ml (mass method, more accurate).", + "clinical_cn": "心肌损伤特异性指标,心肌梗死早期诊断(3-4h升高,1-2天达峰),优于总CK。正常参考值:0-25U/L(活性法);<4ng/ml(质量法,更精准)。" + }, + "T3": { + "clinical_en": "Reflects thyroid function, increases in hyperthyroidism and decreases in hypothyroidism, and is also used to evaluate the severity of the disease (such as significantly high T3 in thyroid storm). Normal reference value: 1.3 - 3.1nmol/L.", + "clinical_cn": "反映甲状腺功能,甲亢时升高、甲减时降低,也用于评估病情严重度(如甲亢危象T3显著高)。正常参考值:1.3-3.1nmol/L。" + }, + "T4": { + "clinical_en": "Reflects thyroid function, increases in hyperthyroidism and decreases in hypothyroidism. Because T4 has a long half - life, it more stably reflects thyroid reserve function. Normal reference value: 12 - 22pmol/L.", + "clinical_cn": "反映甲状腺功能,甲亢时升高、甲减时降低,因T4半衰期长,更稳定反映甲状腺储备功能。正常参考值:12-22pmol/L。" + }, + "FT3": { + "clinical_en": "Reflects the \"active indicator\" of thyroid function, not affected by binding proteins, more accurately diagnosing hyperthyroidism / hypothyroidism. Normal reference value: 3.1 - 6.8pmol/L.", + "clinical_cn": "反映甲状腺功能的“活性指标”,不受结合蛋白影响,更精准诊断甲亢/甲减。正常参考值:3.1-6.8pmol/L。" + }, + "FT4": { + "clinical_en": "The \"active indicator\" of thyroid function, stably reflects thyroid hormone levels, and is a core indicator for the diagnosis and efficacy monitoring of hyperthyroidism / hypothyroidism. Normal reference value: 12 - 22pmol/L.", + "clinical_cn": "甲状腺功能“活性指标”,稳定反映甲状腺激素水平,甲亢/甲减诊断、疗效监测核心指标。正常参考值:12-22pmol/L。" + }, + "TSH": { + "clinical_en": "Reflects the function of the pituitary - thyroid axis, decreases in hyperthyroidism and increases in hypothyroidism, and is the preferred screening indicator for hypothyroidism (sensitive). Normal reference value: 0.27 - 4.2mIU/L.", + "clinical_cn": "反映垂体-甲状腺轴功能,甲亢时降低、甲减时升高,是甲减首选筛查指标(敏感)。正常参考值:0.27-4.2mIU/L。" + }, + "HOMOCYSTEINE": { + "clinical_en": "An independent risk factor for cardiovascular and cerebrovascular diseases, elevation increases the risk of atherosclerosis, cerebral infarction, and coronary heart disease. Normal reference value: 5 - 15μmol/L.", + "clinical_cn": "心脑血管疾病独立危险因素,升高增加动脉粥样硬化、脑梗死、冠心病风险,正常参考值5-15μmol/L。" + }, + "D-DIMER": { + "clinical_en": "(mentioned in the four coagulation items above, focusing on cardiovascular and cerebrovascular risks here) Reflects the body's coagulation and fibrinolytic activity, assisting in the diagnosis of pulmonary embolism, deep vein thrombosis, etc. The normal reference value is < 0.5mg/L (ELISA method).", + "clinical_cn": "反映体内凝血和纤溶活性,辅助诊断肺栓塞、深静脉血栓等,正常参考值<0.5mg/L。" + }, + "25-OH-VD2+D3": { + "clinical_en": "Reflects the body’s vitamin D status and calcium-phosphorus metabolism. It is the most reliable marker for evaluating vitamin D reserves. Deficiency is associated with rickets, osteomalacia, osteoporosis, and increased risks of fractures and certain chronic diseases. The optimal reference range is usually 30–100 ng/mL, with <20 ng/mL indicating deficiency.", + "clinical_cn": "反映机体维生素D水平及钙磷代谢状态,是评估维生素D储备最可靠的指标。缺乏与佝偻病、骨软化症、骨质疏松及骨折风险增加等相关,也可能与部分慢性疾病发生率升高有关。正常参考范围通常为 30–100 ng/mL,<20 ng/mL 提示缺乏。" + }, + "PTH": { + "clinical_en": "Regulates calcium and phosphorus metabolism, maintains blood calcium stability, and diagnoses hyperparathyroidism / hypoparathyroidism. Normal reference value: 15 - 65pg/ml.", + "clinical_cn": "调节钙磷代谢,维持血钙稳定,诊断甲状旁腺功能亢进/减退,正常参考值15-65pg/ml。" + }, + "OST": { + "clinical_en": "Serves as a sensitive marker of bone formation and osteoblastic activity. Elevated levels suggest active bone metabolism (e.g., during growth or fracture healing), while reduced levels may indicate impaired bone formation, osteoporosis, or metabolic bone disease. It is also used to monitor the efficacy of anti-osteoporosis treatments.", + "clinical_cn": "是反映骨形成和成骨细胞活性的敏感指标。升高提示骨代谢活跃(如儿童生长期或骨折愈合期),降低则可能提示骨形成不足、骨质疏松或代谢性骨病。常用于骨质疏松治疗过程中的疗效监测。" + }, + "TPINP": { + "clinical_en": "A bone formation marker, reflecting the synthesis of type I collagen, evaluating bone metabolism and the effect of osteoporosis treatment. The normal reference value varies depending on the detection method, generally 14 - 42ng/ml for adult males, 10 - 35ng/ml for premenopausal females, and 19 - 71ng/ml for postmenopausal females.", + "clinical_cn": "骨形成标志物,反映I型胶原合成情况,评估骨代谢、骨质疏松治疗效果,正常参考值因检测方法而异,一般成人男性14-42ng/ml,女性绝经前10-35ng/ml,绝经后19-71ng/ml。" + }, + "Β-CTX": { + "clinical_en": "A bone resorption marker, reflecting bone collagen degradation, evaluating osteoporosis and bone turnover rate. The normal reference value is 0.13 - 0.47ng/ml for adult males, 0.11 - 0.43ng/ml for premenopausal females, and 0.23 - 0.78ng/ml for postmenopausal females.", + "clinical_cn": "骨吸收标志物,反映骨胶原降解情况,评估骨质疏松、骨转换率,正常参考值成人男性0.13-0.47ng/ml,女性绝经前0.11-0.43ng/ml,绝经后0.23-0.78ng/ml。" + }, + "PB": { + "clinical_en": "Evaluates lead exposure and lead poisoning risk. Excessive lead can damage the nervous system, blood system, digestive system, etc.", + "clinical_cn": "评估铅暴露及铅中毒风险,铅过量会损害神经系统、血液系统、消化系统等。" + }, + "CU": { + "clinical_en": "Participates in the synthesis of various enzymes (such as ceruloplasmin, superoxide dismutase), evaluates nutritional status, liver diseases, and genetic metabolic diseases (such as Wilson's disease).", + "clinical_cn": "参与多种酶(如铜蓝蛋白、超氧化物歧化酶)合成,评估营养状态、肝脏疾病、遗传代谢病(如肝豆状核变性)。" + }, + "ZN": { + "clinical_en": "Participates in growth and development, immune function, and substance metabolism, evaluates nutritional status, immune diseases, and hereditary zinc deficiency diseases.", + "clinical_cn": "参与生长发育、免疫功能、物质代谢,评估营养状态、免疫疾病、遗传性锌缺乏病。" + }, + "MG": { + "clinical_en": "Participates in enzyme activity, neuromuscular function, and bone metabolism, evaluates electrolyte balance and nutritional status.", + "clinical_cn": "参与酶活性、神经肌肉功能、骨代谢,评估电解质平衡、营养状态。" + }, + "FE": { + "clinical_en": "Participates in hemoglobin synthesis and oxygen transport, evaluates iron - deficiency anemia and iron overload (such as hemochromatosis).", + "clinical_cn": "参与血红蛋白合成、氧运输,评估缺铁性贫血、铁过载(如血色病)。" + }, + "CD3+": { + "clinical_en": "Reflects the total number of T cells, evaluates cellular immune function, and assists in the diagnosis of immunodeficiency diseases, autoimmune diseases, and tumor immune status.", + "clinical_cn": "反映总T细胞数量,评估细胞免疫功能,辅助诊断免疫缺陷病、自身免疫病、肿瘤免疫状态。" + }, + "CD4+": { + "clinical_en": "Assists in immune response, is a core indicator for AIDS diagnosis and condition monitoring, and is also used to evaluate autoimmune diseases and tumor immune status.", + "clinical_cn": "辅助免疫应答,是艾滋病诊断、病情监测核心指标,也用于评估自身免疫病、肿瘤免疫状态。" + }, + "CD8+": { + "clinical_en": "Reflects the clarity and transparency of urine, assisting in judging abnormal components in the urine. Normal urine is clear and transparent, and abnormal turbidity is related to the presence of cells, crystals, bacteria, etc. in the urine.", + "clinical_cn": "杀伤感染细胞、肿瘤细胞,评估免疫功能、肿瘤免疫监视、病毒感染状态。" + }, + "IGG": { + "clinical_en": "The most abundant immunoglobulin in the body, involved in anti-infection and autoimmune regulation; used to assess humoral immune function, diagnose autoimmune diseases, and infectious diseases.", + "clinical_cn": "体内含量最高的免疫球蛋白,参与抗感染、自身免疫调节,评估体液免疫功能、诊断自身免疫病、感染性疾病。" + }, + "IGA": { + "clinical_en": "Participates in mucosal immunity (respiratory and gastrointestinal tracts); used for evaluating infections, autoimmune diseases, and allergic disorders.", + "clinical_cn": "参与黏膜免疫(如呼吸道、消化道),评估感染、自身免疫病、过敏性疾病。" + }, + "IGM": { + "clinical_en": "An early-response immunoglobulin during acute infection; used to evaluate acute infections, autoimmune diseases, and hematological disorders (e.g., Waldenström’s macroglobulinemia).", + "clinical_cn": "感染早期快速应答的免疫球蛋白,评估急性感染、自身免疫病、血液病(如巨球蛋白血症)。" + }, + "IGE": { + "clinical_en": "Involved in allergic diseases and immune response against parasitic infections; used to assess allergic status and parasitic diseases.", + "clinical_cn": "参与过敏性疾病、寄生虫感染免疫应答,评估过敏状态、寄生虫病。" + }, + "C3": { + "clinical_en": "A central component of the complement system; plays a role in immune defense and regulation; used in evaluating autoimmune diseases, infectious diseases, and renal disorders.", + "clinical_cn": "补体系统核心成分,参与免疫防御、免疫调节,评估自身免疫病、感染性疾病、肾脏疾病。" + }, + "C4": { + "clinical_en": "Involved in activation of the classical complement pathway; aids in the diagnosis of autoimmune diseases (e.g., systemic lupus erythematosus) and hereditary complement deficiencies.", + "clinical_cn": "参与补体经典途径激活,辅助诊断自身免疫病(如系统性红斑狼疮)、遗传性补体缺陷病。" + }, + "CRP": { + "clinical_en": "An acute-phase protein; rapidly assesses infection, inflammation, and tissue injury.", + "clinical_cn": "急性时相反应蛋白,快速评估感染、炎症、组织损伤程度。" + }, + "ESR": { + "clinical_en": "Indirectly reflects inflammation, tissue damage, and anemia; assists in diagnosing infections, autoimmune diseases, and malignancies.", + "clinical_cn": "间接反映炎症、组织损伤、贫血等,辅助诊断感染、自身免疫病、肿瘤。" + }, + "ASO": { + "clinical_en": "Used to diagnose Group A β-hemolytic streptococcal infections (e.g., scarlet fever, rheumatic fever, acute glomerulonephritis); reflects recent infection history.", + "clinical_cn": "诊断A组溶血性链球菌感染(如猩红热、风湿热、急性肾小球肾炎),评估近期感染史。" + }, + "ANA": { + "clinical_en": "A screening marker for autoimmune diseases (particularly connective tissue diseases); used in evaluating systemic lupus erythematosus, rheumatoid arthritis, and Sjögren’s syndrome.", + "clinical_cn": "自身免疫病(尤其是结缔组织病)的筛查指标,评估系统性红斑狼疮、类风湿关节炎、干燥综合征等。" + }, + "RF": { + "clinical_en": "Helps in diagnosing rheumatoid arthritis; used to assess disease activity and prognosis.", + "clinical_cn": "辅助诊断类风湿关节炎,评估疾病活动度、预后。" + }, + "NEUT": { + "clinical_en": "Neutrophils are the main component of white blood cells and play a key role in anti - infection. Changes in their count and percentage reflect infection, inflammation, and hematopoietic conditions. The normal count is (2.0 - 7.5) × 10⁹/L, and the percentage is 50% - 70%.", + "clinical_cn": "中性粒细胞是白细胞主要成分,在抗感染中起关键作用,数量和百分比变化反映感染、炎症及造血情况,正常数量(2.0-7.5)×10⁹/L,百分比50%-70%。" + }, + "EOS": { + "clinical_en": "Participates in immune processes such as allergic reactions and anti - parasitic infections. Changes in their count and percentage are significant for the diagnosis of allergic diseases, parasitic infections, etc. The normal count is (0.02 - 0.52) × 10⁹/L, and the percentage is 0.4% - 8%.", + "clinical_cn": "参与过敏反应、抗寄生虫感染等免疫过程,数量和百分比变化对过敏疾病、寄生虫感染等诊断有意义,正常数量(0.02-0.52)×10⁹/L,百分比0.4%-8%。" + }, + "BAS": { + "clinical_en": "Participates in allergic reactions and releases bioactive substances (such as histamine). Although their number is small, they are valuable for the diagnosis of allergies and certain blood diseases. The normal count is (0 - 0.06) × 10⁹/L, and the percentage is 0% - 1%.", + "clinical_cn": "参与过敏反应,释放生物活性物质(如组胺),数量少但对过敏、某些血液疾病诊断有价值,正常数量(0-0.06)×10⁹/L,百分比0%-1%。" + }, + "LYMPH": { + "clinical_en": "Lymphocytes are an important part of immune cells and participate in specific immunity (such as humoral immunity and cellular immunity). Changes in their count and percentage reflect immune status, infections, and blood diseases, etc. The normal count is (1.1 - 3.2) × 10⁹/L, and the percentage is 20% - 50%.", + "clinical_cn": "淋巴细胞是免疫细胞重要组成,参与特异性免疫(如体液免疫、细胞免疫),数量和百分比变化反映免疫状态、感染及血液疾病等,正常数量(1.1-3.2)×10⁹/L,百分比20%-50%。" + }, + "MONO": { + "clinical_en": "Monocytes are precursors of macrophages and play a role in anti - infection and immune regulation, participating in the clearance of pathogens, foreign bodies, etc. The normal count is (0.1 - 0.6) × 10⁹/L, and the percentage is 3% - 10%.", + "clinical_cn": "单核细胞是巨噬细胞的前体,在抗感染、免疫调节中起作用,参与清除病原体、异物等,正常数量(0.1-0.6)×10⁹/L,百分比3%-10%。" + } +} \ No newline at end of file diff --git a/backend/test_baidu_ocr.py b/backend/test_baidu_ocr.py new file mode 100644 index 0000000..a8a6881 --- /dev/null +++ b/backend/test_baidu_ocr.py @@ -0,0 +1,370 @@ +# -*- coding: utf-8 -*- +""" +百度OCR识别测试脚本 +测试目标:使用百度OCR对产品报价表图片进行识别,验证识别效果 +""" + +import os +import sys +import io +import json +import time +from pathlib import Path + +# 修复 Windows 终端 UTF-8 输出 +sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8", errors="replace") +sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding="utf-8", errors="replace") + +# 加载环境变量 +from dotenv import load_dotenv +load_dotenv(Path(__file__).parent / ".env") + +from aip import AipOcr + + +# ==================== 配置 ==================== + +APP_ID = os.getenv("BAIDU_OCR_APP_ID", "") +API_KEY = os.getenv("BAIDU_OCR_API_KEY", "") +SECRET_KEY = os.getenv("BAIDU_OCR_SECRET_KEY", "") + +# 测试图片路径 +IMAGE_PATH = r"C:\Users\UI\.cursor\projects\c-Users-UI-Desktop\assets\c__Users_UI_AppData_Roaming_Cursor_User_workspaceStorage_6df83b93d4a0651428307542725e79d8_images_ecdbe509-3f63-49c0-a8be-db9facaef857_3_-4dec6c0d-a755-4bda-8780-9e6b20e02df8.png" + + +def test_accurate_basic(client, image_data): + """测试1:通用文字识别(高精度版)- basicAccurate""" + print("\n" + "=" * 70) + print("[测试1] 通用文字识别(高精度版)- basicAccurate") + print("=" * 70) + + start = time.time() + result = client.basicAccurate(image_data) + elapsed = time.time() - start + + if "error_code" in result: + print(f" [FAIL] 错误 ({result['error_code']}): {result.get('error_msg', '未知错误')}") + return None + + words_result = result.get("words_result", []) + print(f" [OK] 识别成功 | 耗时: {elapsed:.2f}s | 识别行数: {len(words_result)}") + print("-" * 70) + for i, item in enumerate(words_result): + print(f" [{i+1:3d}] {item['words']}") + + return result + + +def test_accurate(client, image_data): + """测试2:通用文字识别(高精度含位置版)- accurate""" + print("\n" + "=" * 70) + print("[测试2] 通用文字识别(高精度含位置版)- accurate") + print("=" * 70) + + start = time.time() + result = client.accurate(image_data) + elapsed = time.time() - start + + if "error_code" in result: + print(f" [FAIL] 错误 ({result['error_code']}): {result.get('error_msg', '未知错误')}") + return None + + words_result = result.get("words_result", []) + print(f" [OK] 识别成功 | 耗时: {elapsed:.2f}s | 识别行数: {len(words_result)}") + print("-" * 70) + for i, item in enumerate(words_result): + loc = item.get("location", {}) + pos_str = f"(x={loc.get('left',0)}, y={loc.get('top',0)}, w={loc.get('width',0)}, h={loc.get('height',0)})" + print(f" [{i+1:3d}] {pos_str:40s} {item['words']}") + + return result + + +def test_general_basic(client, image_data): + """测试3:通用文字识别(标准版)- basicGeneral""" + print("\n" + "=" * 70) + print("[测试3] 通用文字识别(标准版)- basicGeneral") + print("=" * 70) + + start = time.time() + result = client.basicGeneral(image_data) + elapsed = time.time() - start + + if "error_code" in result: + print(f" [FAIL] 错误 ({result['error_code']}): {result.get('error_msg', '未知错误')}") + return None + + words_result = result.get("words_result", []) + print(f" [OK] 识别成功 | 耗时: {elapsed:.2f}s | 识别行数: {len(words_result)}") + print("-" * 70) + for i, item in enumerate(words_result): + print(f" [{i+1:3d}] {item['words']}") + + return result + + +def test_table_recognize(client, image_data): + """测试4:表格文字识别 - tableRecognition (异步)""" + print("\n" + "=" * 70) + print("[测试4] 表格文字识别 - tableRecognitionAsync") + print("=" * 70) + + # 提交表格识别请求 + start = time.time() + result = client.tableRecognitionAsync(image_data) + + if "error_code" in result: + print(f" [FAIL] 提交失败 ({result['error_code']}): {result.get('error_msg', '未知错误')}") + return None + + # tableRecognitionAsync 返回格式可能不同,兼容处理 + result_list = result.get("result", []) + if isinstance(result_list, list) and len(result_list) > 0: + request_id = result_list[0].get("request_id", "") + elif isinstance(result_list, dict): + request_id = result_list.get("request_id", "") + else: + request_id = "" + + if not request_id: + print(f" [FAIL] 未获取到 request_id,返回结果: {json.dumps(result, ensure_ascii=False)}") + return None + + print(f" [INFO] 提交成功 | request_id: {request_id}") + print(" [INFO] 等待识别结果...") + + # 轮询获取结果(最多等60秒) + ret_code = -1 + for attempt in range(20): + time.sleep(3) + get_result = client.getTableRecognitionResult(request_id) + + if "error_code" in get_result: + print(f" [FAIL] 查询失败 ({get_result['error_code']}): {get_result.get('error_msg', '')}") + return None + + percent = get_result.get("result", {}).get("percent", 0) + ret_code = get_result.get("result", {}).get("ret_code", -1) + + if ret_code == 3: + # 识别完成 + elapsed = time.time() - start + print(f" [OK] 识别完成 | 耗时: {elapsed:.2f}s") + + # 解析表格结果 + result_data = get_result.get("result", {}).get("result_data", "") + if result_data: + print("-" * 70) + print(" 表格识别结果(原始):") + try: + table_data = json.loads(result_data) + formatted = json.dumps(table_data, ensure_ascii=False, indent=2) + print(formatted[:5000]) + if len(formatted) > 5000: + print(" ... (结果过长,已截断)") + except Exception: + print(result_data[:5000]) + + return get_result + + print(f" 轮询 {attempt+1}/20 | 进度: {percent}%") + + elapsed = time.time() - start + print(f" [WARN] 超时(等待 {elapsed:.1f}s),最后状态: ret_code={ret_code}") + return None + + +def test_web_image(client, image_data): + """测试5:网络图片文字识别 - webImage""" + print("\n" + "=" * 70) + print("[测试5] 网络图片文字识别 - webImage") + print("=" * 70) + + start = time.time() + result = client.webImage(image_data) + elapsed = time.time() - start + + if "error_code" in result: + print(f" [FAIL] 错误 ({result['error_code']}): {result.get('error_msg', '未知错误')}") + return None + + words_result = result.get("words_result", []) + print(f" [OK] 识别成功 | 耗时: {elapsed:.2f}s | 识别行数: {len(words_result)}") + print("-" * 70) + for i, item in enumerate(words_result): + print(f" [{i+1:3d}] {item['words']}") + + return result + + +def test_table_sync(client, image_data): + """测试6:表格识别(同步版)- form""" + print("\n" + "=" * 70) + print("[测试6] 表格识别(同步版)- form") + print("=" * 70) + + start = time.time() + result = client.form(image_data) + elapsed = time.time() - start + + if "error_code" in result: + print(f" [FAIL] 错误 ({result['error_code']}): {result.get('error_msg', '未知错误')}") + return None + + forms_result = result.get("forms_result", []) + print(f" [OK] 识别成功 | 耗时: {elapsed:.2f}s | 表单数: {len(forms_result)}") + print("-" * 70) + + # 打印表格内容 + for f_idx, form in enumerate(forms_result): + print(f"\n === 表单 {f_idx + 1} ===") + header = form.get("header", []) + body = form.get("body", []) + footer = form.get("footer", []) + + if header: + print(" [表头]") + for row in header: + if isinstance(row, dict): + print(f" {row.get('words', row)}") + elif isinstance(row, list): + row_text = " | ".join( + cell.get("words", str(cell)) if isinstance(cell, dict) else str(cell) + for cell in row + ) + print(f" {row_text}") + + if body: + print(" [表体]") + for r_idx, row in enumerate(body[:80]): + if isinstance(row, dict): + print(f" {row.get('words', row)}") + elif isinstance(row, list): + row_text = " | ".join( + cell.get("words", str(cell)) if isinstance(cell, dict) else str(cell) + for cell in row + ) + print(f" {row_text}") + if len(body) > 80: + print(f" ... (共 {len(body)} 行)") + + # 如果 forms_result 为空,打印原始结果 + if not forms_result: + print(f" 原始结果键: {list(result.keys())}") + formatted = json.dumps(result, ensure_ascii=False, indent=2) + print(formatted[:3000]) + + return result + + +def save_results(results, output_path): + """保存识别结果到JSON文件""" + with open(output_path, "w", encoding="utf-8") as f: + json.dump(results, f, ensure_ascii=False, indent=2) + print(f"\n[SAVE] 结果已保存到: {output_path}") + + +def main(): + print("=" * 70) + print("百度OCR识别测试 - 产品报价表图片") + print("=" * 70) + + # 检查配置 + if not all([APP_ID, API_KEY, SECRET_KEY]): + print("[FAIL] 百度OCR未配置,请检查 .env 文件中的 BAIDU_OCR_* 变量") + sys.exit(1) + + print(f" APP_ID: {APP_ID}") + print(f" API_KEY: {API_KEY[:8]}...") + print(f" SECRET_KEY: {SECRET_KEY[:8]}...") + + # 检查图片文件 + if not Path(IMAGE_PATH).exists(): + print(f"[FAIL] 图片文件不存在: {IMAGE_PATH}") + sys.exit(1) + + file_size = Path(IMAGE_PATH).stat().st_size + print(f" 图片路径: {IMAGE_PATH}") + print(f" 文件大小: {file_size / 1024:.1f} KB") + + # 初始化百度OCR客户端 + client = AipOcr(APP_ID, API_KEY, SECRET_KEY) + + # 读取图片 + with open(IMAGE_PATH, "rb") as f: + image_data = f.read() + + print(f" 图片数据: {len(image_data)} bytes") + + # 收集所有测试结果 + all_results = {} + + # ---- 测试1:高精度版 ---- + r1 = test_accurate_basic(client, image_data) + if r1: + all_results["accurate_basic"] = { + "method": "basicAccurate(高精度版)", + "lines": len(r1.get("words_result", [])), + "data": r1, + } + + # ---- 测试2:高精度含位置版 ---- + r2 = test_accurate(client, image_data) + if r2: + all_results["accurate_with_location"] = { + "method": "accurate(高精度含位置版)", + "lines": len(r2.get("words_result", [])), + "data": r2, + } + + # ---- 测试3:标准版 ---- + r3 = test_general_basic(client, image_data) + if r3: + all_results["general_basic"] = { + "method": "basicGeneral(标准版)", + "lines": len(r3.get("words_result", [])), + "data": r3, + } + + # ---- 测试4:表格识别(异步) ---- + r4 = test_table_recognize(client, image_data) + if r4: + all_results["table_recognition_async"] = { + "method": "tableRecognitionAsync(表格识别-异步)", + "data": r4, + } + + # ---- 测试5:网络图片文字识别 ---- + r5 = test_web_image(client, image_data) + if r5: + all_results["web_image"] = { + "method": "webImage(网络图片文字识别)", + "lines": len(r5.get("words_result", [])), + "data": r5, + } + + # ---- 测试6:表格识别(同步) ---- + r6 = test_table_sync(client, image_data) + if r6: + all_results["table_sync"] = { + "method": "form(表格识别-同步)", + "data": r6, + } + + # ---- 汇总 ---- + print("\n" + "=" * 70) + print("测试汇总") + print("=" * 70) + for key, val in all_results.items(): + lines = val.get("lines", "N/A") + print(f" {val['method']:45s} 识别行数: {lines}") + + # 保存结果 + output_path = Path(__file__).parent / "test_baidu_ocr_results.json" + save_results(all_results, output_path) + + print("\n[DONE] 所有测试完成!") + + +if __name__ == "__main__": + main() diff --git a/backend/test_baidu_ocr_results.json b/backend/test_baidu_ocr_results.json new file mode 100644 index 0000000..7ef1187 --- /dev/null +++ b/backend/test_baidu_ocr_results.json @@ -0,0 +1,552 @@ +{ + "accurate_with_location": { + "method": "accurate(高精度含位置版)", + "lines": 60, + "data": { + "words_result": [ + { + "words": "南昌环宇产品报价", + "location": { + "top": 0, + "left": 287, + "width": 213, + "height": 15 + } + }, + { + "words": "2026.02.04仅供参考", + "location": { + "top": 2, + "left": 533, + "width": 156, + "height": 12 + } + }, + { + "words": "随时变化", + "location": { + "top": 0, + "left": 689, + "width": 61, + "height": 15 + } + }, + { + "words": "报价仅针对经销商价格仅供参考行情变化快不接受退货咨询电话:0791-86315357", + "location": { + "top": 20, + "left": 313, + "width": 259, + "height": 6 + } + }, + { + "words": "绿色降/红色涨/蓝色订货/黑色无变化", + "location": { + "top": 20, + "left": 604, + "width": 109, + "height": 6 + } + }, + { + "words": "达内存国态,麦国主板,艾尔莎板卡,华南金牌主板,惠视现代显示器,美奇显示器。2019年8月8日起,intel所有CPU均保修三年(散片/原盒),之前的按照标贴处理,所有硬盘(含国态/U盘)不对任何存储数负责", + "location": { + "top": 31, + "left": 195, + "width": 656, + "height": 7 + } + }, + { + "words": "1-11intad", + "location": { + "top": 43, + "left": 19, + "width": 34, + "height": 6 + } + }, + { + "words": "WONA5PLU5保三", + "location": { + "top": 45, + "left": 117, + "width": 81, + "height": 3 + } + }, + { + "words": "ST驶PC盘二年换", + "location": { + "top": 43, + "left": 236, + "width": 60, + "height": 6 + } + }, + { + "words": "菱光国三年取", + "location": { + "top": 43, + "left": 351, + "width": 45, + "height": 6 + } + }, + { + "words": "龙内存三年", + "location": { + "top": 43, + "left": 460, + "width": 44, + "height": 6 + } + }, + { + "words": "金达因品三年保", + "location": { + "top": 43, + "left": 586, + "width": 50, + "height": 6 + } + }, + { + "words": "U型/T卡系列", + "location": { + "top": 43, + "left": 717, + "width": 45, + "height": 6 + } + }, + { + "words": "WD20PX 5400RFM/128548", + "location": { + "top": 52, + "left": 105, + "width": 97, + "height": 6 + } + }, + { + "words": "013000068002100", + "location": { + "top": 54, + "left": 217, + "width": 74, + "height": 3 + } + }, + { + "words": "403400U", + "location": { + "top": 54, + "left": 431, + "width": 63, + "height": 3 + } + }, + { + "words": "83.0", + "location": { + "top": 52, + "left": 818, + "width": 29, + "height": 8 + } + }, + { + "words": "4F", + "location": { + "top": 61, + "left": 108, + "width": 26, + "height": 3 + } + }, + { + "words": "165", + "location": { + "top": 60, + "left": 515, + "width": 20, + "height": 6 + } + }, + { + "words": "达P202560", + "location": { + "top": 61, + "left": 581, + "width": 40, + "height": 3 + } + }, + { + "words": "275", + "location": { + "top": 60, + "left": 668, + "width": 8, + "height": 10 + } + }, + { + "words": "2", + "location": { + "top": 61, + "left": 690, + "width": 13, + "height": 3 + } + }, + { + "words": "600", + "location": { + "top": 61, + "left": 705, + "width": 9, + "height": 3 + } + }, + { + "words": "2HDM", + "location": { + "top": 61, + "left": 956, + "width": 20, + "height": 3 + } + }, + { + "words": "8-0J57,28T47ALEL4/47", + "location": { + "top": 81, + "left": 110, + "width": 66, + "height": 3 + } + }, + { + "words": "达4P1803120r", + "location": { + "top": 81, + "left": 580, + "width": 55, + "height": 3 + } + }, + { + "words": "582", + "location": { + "top": 81, + "left": 716, + "width": 22, + "height": 3 + } + }, + { + "words": "2900", + "location": { + "top": 86, + "left": 192, + "width": 12, + "height": 10 + } + }, + { + "words": "T0HA/动", + "location": { + "top": 97, + "left": 226, + "width": 75, + "height": 3 + } + }, + { + "words": "4个U5", + "location": { + "top": 92, + "left": 824, + "width": 28, + "height": 8 + } + }, + { + "words": "内", + "location": { + "top": 117, + "left": 457, + "width": 51, + "height": 3 + } + }, + { + "words": "4000M00244T", + "location": { + "top": 120, + "left": 123, + "width": 49, + "height": 6 + } + }, + { + "words": "金士/T列三年月区", + "location": { + "top": 137, + "left": 703, + "width": 72, + "height": 3 + } + }, + { + "words": "79", + "location": { + "top": 140, + "left": 22, + "width": 6, + "height": 5 + } + }, + { + "words": "75120000007", + "location": { + "top": 161, + "left": 117, + "width": 37, + "height": 3 + } + }, + { + "words": "6", + "location": { + "top": 180, + "left": 9, + "width": 4, + "height": 4 + } + }, + { + "words": "/13400", + "location": { + "top": 181, + "left": 36, + "width": 18, + "height": 3 + } + }, + { + "words": "LD", + "location": { + "top": 181, + "left": 157, + "width": 20, + "height": 3 + } + }, + { + "words": "0", + "location": { + "top": 213, + "left": 670, + "width": 4, + "height": 6 + } + }, + { + "words": "000", + "location": { + "top": 228, + "left": 668, + "width": 8, + "height": 6 + } + }, + { + "words": "440", + "location": { + "top": 249, + "left": 669, + "width": 8, + "height": 3 + } + }, + { + "words": "0", + "location": { + "top": 260, + "left": 217, + "width": 4, + "height": 4 + } + }, + { + "words": "KT三年", + "location": { + "top": 265, + "left": 124, + "width": 67, + "height": 3 + } + }, + { + "words": "D5疗三年会", + "location": { + "top": 265, + "left": 580, + "width": 62, + "height": 3 + } + }, + { + "words": "0", + "location": { + "top": 267, + "left": 304, + "width": 6, + "height": 6 + } + }, + { + "words": "GAMNG WPUUS05 ", + "location": { + "top": 269, + "left": 919, + "width": 68, + "height": 3 + } + }, + { + "words": "00", + "location": { + "top": 274, + "left": 197, + "width": 7, + "height": 3 + } + }, + { + "words": "同情机主版以倒三年", + "location": { + "top": 276, + "left": 934, + "width": 55, + "height": 6 + } + }, + { + "words": "XD", + "location": { + "top": 320, + "left": 840, + "width": 18, + "height": 3 + } + }, + { + "words": "130", + "location": { + "top": 396, + "left": 257, + "width": 14, + "height": 3 + } + }, + { + "words": "TT", + "location": { + "top": 460, + "left": 159, + "width": 10, + "height": 3 + } + }, + { + "words": "固店三年有", + "location": { + "top": 476, + "left": 589, + "width": 45, + "height": 3 + } + }, + { + "words": "1", + "location": { + "top": 480, + "left": 306, + "width": 3, + "height": 5 + } + }, + { + "words": "士05内三年", + "location": { + "top": 496, + "left": 438, + "width": 87, + "height": 3 + } + }, + { + "words": "GDU", + "location": { + "top": 500, + "left": 933, + "width": 28, + "height": 3 + } + }, + { + "words": "127:25", + "location": { + "top": 512, + "left": 618, + "width": 25, + "height": 3 + } + }, + { + "words": "MAIGU板三个月三年", + "location": { + "top": 516, + "left": 795, + "width": 101, + "height": 4 + } + }, + { + "words": "0", + "location": { + "top": 531, + "left": 885, + "width": 5, + "height": 5 + } + }, + { + "words": "PU三", + "location": { + "top": 536, + "left": 22, + "width": 60, + "height": 3 + } + }, + { + "words": "1", + "location": { + "top": 540, + "left": 244, + "width": 21, + "height": 3 + } + } + ], + "words_result_num": 60, + "log_id": 2019648663845167347 + } + } +} \ No newline at end of file diff --git a/backend/test_extraction_logic.py b/backend/test_extraction_logic.py new file mode 100644 index 0000000..ec23cf6 --- /dev/null +++ b/backend/test_extraction_logic.py @@ -0,0 +1,551 @@ +# -*- coding: utf-8 -*- +""" +测试提取逻辑 - 不调用OCR/DeepSeek API,纯本地测试 +测试内容: + 1. parse_medical_data_v2: OCR文本 → 检测项解析 + 2. classify_abb_module: ABB/项目名 → 模块分类(含中文关键词) + 3. match_with_template: 提取数据 → 模板匹配 +""" +import sys +import os +import io +import json + +# 修复 Windows 终端 UTF-8 +sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8", errors="replace") +sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding="utf-8", errors="replace") + +# 确保 backend 目录在 path 中 +sys.path.insert(0, os.path.dirname(os.path.abspath(__file__))) + +from parse_medical_v2 import parse_medical_data_v2, clean_extracted_data_v2 +from extract_and_fill_report import classify_abb_module, match_with_template + + +# ============================================================ +# 测试1: classify_abb_module - ABB硬编码映射 +# ============================================================ +def test_abb_mapping(): + """测试ABB硬编码映射能否正确分类""" + print("\n" + "=" * 70) + print("[测试1] ABB硬编码映射") + print("=" * 70) + + test_cases = [ + # (abb, project_name, expected_module) + # 尿检 + ("COLOR", "Color", "Urine Detection"), + ("PH", "pH", "Urine Detection"), + ("PRO", "Protein", "Urine Detection"), + ("SG", "Specific Gravity", "Urine Detection"), + # 血常规 + ("WBC", "White Blood Cell", "Complete Blood Count"), + ("RBC", "Red Blood Cell", "Complete Blood Count"), + ("HGB", "Hemoglobin", "Complete Blood Count"), + ("PLT", "Platelet Count", "Complete Blood Count"), + ("ESR", "ESR 1 Hour", "Complete Blood Count"), + # 肝功能 + ("ALT", "Alanine Aminotransferase", "Liver Function"), + ("AST", "Aspartate Aminotransferase", "Liver Function"), + ("GGT", "Gamma GT", "Liver Function"), + ("TBIL", "Total Bilirubin", "Liver Function"), + ("ALB", "Albumin", "Liver Function"), + # 肾功能 + ("BUN", "Blood Urea Nitrogen", "Kidney Function"), + ("CREA", "Creatinine", "Kidney Function"), + ("UA", "Uric Acid", "Kidney Function"), + # 血脂 + ("TC", "Total Cholesterol", "Lipid Panel"), + ("TG", "Triglyceride", "Lipid Panel"), + ("HDL", "HDL Cholesterol", "Lipid Panel"), + ("LDL", "LDL Cholesterol", "Lipid Panel"), + # 电解质 + ("NA", "Sodium", "Electrolytes"), + ("K", "Potassium", "Electrolytes"), + ("CL", "Chloride", "Electrolytes"), + ("CA", "Calcium", "Electrolytes"), + # 血糖 + ("FPG", "Fasting Glucose", "Glucose"), + ("HBA1C", "HbA1c", "Glucose"), + # 甲状腺 + ("TSH", "TSH", "Thyroid"), + ("FT3", "Free T3", "Thyroid"), + ("FT4", "Free T4", "Thyroid"), + # 激素 + ("E2", "Estradiol", "Hormone"), + ("FSH", "FSH", "Hormone"), + ("LH", "LH", "Hormone"), + ("CORTISOL", "Cortisol", "Hormone"), + # 肿瘤标志物 + ("AFP", "Alpha Fetoprotein", "Tumor Markers"), + ("CEA", "CEA", "Tumor Markers"), + ("CA125", "CA125", "Tumor Markers"), + ("PSA", "PSA", "Tumor Markers"), + # 凝血 + ("PT", "Prothrombin Time", "Coagulation"), + ("APTT", "APTT", "Coagulation"), + ("FIB", "Fibrinogen", "Coagulation"), + # 传染病 + ("HBSAG", "HBsAg", "Infectious Disease"), + ("HIV", "HIV", "Infectious Disease"), + # 免疫功能 + ("IGG", "IgG", "Immune Function"), + ("C3", "Complement C3", "Immune Function"), + ("CRP", "CRP", "Immune Function"), + # 骨代谢 + ("OSTE", "Osteocalcin", "Bone Metabolism"), + ("PTH", "PTH", "Bone Metabolism"), + # 重金属 + ("PB", "Lead", "Heavy Metals"), + ("HG", "Mercury", "Heavy Metals"), + # 维生素 + ("VITB12", "Vitamin B12", "Vitamin"), + ("FOLATE", "Folate", "Vitamin"), + # 同型半胱氨酸 + ("HCY", "Homocysteine", "Homocysteine"), + # 血型 + ("ABO", "ABO Blood Group", "Blood Type"), + ] + + passed = 0 + failed = 0 + for abb, project, expected in test_cases: + result = classify_abb_module(abb, project, api_key=None) + if result == expected: + passed += 1 + else: + failed += 1 + print(f" [FAIL] ABB={abb}, project={project}") + print(f" 期望: {expected}, 实际: {result}") + + print(f"\n 结果: {passed} 通过, {failed} 失败 / 共 {len(test_cases)} 项") + return failed == 0 + + +# ============================================================ +# 测试2: classify_abb_module - 中文关键词匹配 +# ============================================================ +def test_chinese_keyword_matching(): + """测试中文关键词能否正确匹配模块""" + print("\n" + "=" * 70) + print("[测试2] 中文关键词匹配") + print("=" * 70) + + # 用不在ABB映射中的假ABB,强制走keyword匹配 + test_cases = [ + # (abb, project_name_cn, expected_module) + # 尿液 + ("X001", "尿液分析", "Urine Detection"), + ("X002", "尿检常规", "Urine Detection"), + ("X003", "隐血试验", "Urine Detection"), + ("X004", "酮体检测", "Urine Detection"), + # 血常规 + ("X010", "红细胞计数", "Complete Blood Count"), + ("X011", "白细胞分类", "Complete Blood Count"), + ("X012", "血红蛋白测定", "Complete Blood Count"), + ("X013", "血小板计数", "Complete Blood Count"), + ("X014", "中性粒细胞百分比", "Complete Blood Count"), + ("X015", "嗜酸性粒细胞", "Complete Blood Count"), + ("X016", "单核细胞计数", "Complete Blood Count"), + # 肝功能 + ("X020", "肝功能全套", "Liver Function"), + ("X021", "总蛋白测定", "Liver Function"), + ("X022", "白蛋白测定", "Liver Function"), + ("X023", "胆红素测定", "Liver Function"), + ("X024", "转氨酶检测", "Liver Function"), + ("X025", "谷氨酰转肽酶", "Liver Function"), + # 肾功能 + ("X030", "肾功能检测", "Kidney Function"), + ("X031", "血清肌酐", "Kidney Function"), + ("X032", "尿素氮测定", "Kidney Function"), + ("X033", "尿酸检测", "Kidney Function"), + # 血脂 + ("X040", "总胆固醇", "Lipid Panel"), + ("X041", "甘油三酯测定", "Lipid Panel"), + ("X042", "高密度脂蛋白", "Lipid Panel"), + ("X043", "血脂四项", "Lipid Panel"), + # 血糖 + ("X050", "空腹血糖测定", "Glucose"), + ("X051", "糖化血红蛋白检测", "Glucose"), + ("X052", "随机血糖", "Glucose"), + # 甲状腺 + ("X060", "甲状腺功能", "Thyroid"), + ("X061", "促甲状腺激素", "Thyroid"), + # 激素 + ("X070", "雌二醇测定", "Hormone"), + ("X071", "孕酮检测", "Hormone"), + ("X072", "睾酮水平", "Hormone"), + ("X073", "皮质醇测定", "Hormone"), + ("X074", "催乳素检测", "Hormone"), + ("X075", "荷尔蒙全套", "Hormone"), + ("X076", "促卵泡生成素", "Hormone"), + ("X077", "促黄体生成素", "Hormone"), + ("X078", "脱氢表雄酮硫酸盐", "Hormone"), + ("X079", "胰岛素样生长因子", "Hormone"), + ("X080", "抗缪勒管激素", "Hormone"), + # 肿瘤标志物 + ("X090", "肿瘤标志物全套", "Tumor Markers"), + ("X091", "甲胎蛋白检测", "Tumor Markers"), + ("X092", "癌胚抗原测定", "Tumor Markers"), + ("X093", "铁蛋白检测", "Tumor Markers"), + ("X094", "糖类抗原125", "Tumor Markers"), + ("X095", "前列腺特异性抗原", "Tumor Markers"), + ("X096", "鳞状细胞癌抗原", "Tumor Markers"), + ("X097", "神经元特异性烯醇化酶", "Tumor Markers"), + # 凝血 + ("X100", "凝血功能检测", "Coagulation"), + ("X101", "纤维蛋白原测定", "Coagulation"), + # 传染病 + ("X110", "乙肝五项", "Infectious Disease"), + ("X111", "丙肝抗体", "Infectious Disease"), + ("X112", "梅毒筛查", "Infectious Disease"), + ("X113", "传染病四项", "Infectious Disease"), + # 免疫功能 + ("X120", "免疫球蛋白测定", "Immune Function"), + ("X121", "补体C3检测", "Immune Function"), + ("X122", "c反应蛋白测定", "Immune Function"), + ("X123", "抗核抗体检测", "Immune Function"), + ("X124", "类风湿因子测定", "Immune Function"), + ("X125", "红细胞沉降速率", "Immune Function"), + # 骨代谢 + ("X130", "骨代谢标志物", "Bone Metabolism"), + ("X131", "骨钙素检测", "Bone Metabolism"), + ("X132", "甲状旁腺激素", "Bone Metabolism"), + ("X133", "25-羟维生素d检测", "Bone Metabolism"), + # 重金属 + ("X140", "微量元素检测", "Heavy Metals"), + ("X141", "重金属筛查", "Heavy Metals"), + # 同型半胱氨酸 + ("X150", "同型半胱氨酸检测", "Homocysteine"), + # 血型 + ("X160", "ABO血型鉴定", "Blood Type"), + # 电解质 + ("X170", "电解质全套", "Electrolytes"), + ("X171", "血清钾测定", "Electrolytes"), + ("X172", "血清钠检测", "Electrolytes"), + ("X173", "血清钙测定", "Electrolytes"), + ] + + passed = 0 + failed = 0 + for abb, project, expected in test_cases: + result = classify_abb_module(abb, project, api_key=None) + if result == expected: + passed += 1 + else: + failed += 1 + print(f" [FAIL] project={project}") + print(f" 期望: {expected}, 实际: {result}") + + print(f"\n 结果: {passed} 通过, {failed} 失败 / 共 {len(test_cases)} 项") + return failed == 0 + + +# ============================================================ +# 测试3: parse_medical_data_v2 - OCR文本解析 +# ============================================================ +def test_parse_ocr_text(): + """测试OCR文本解析能否正确提取检测项""" + print("\n" + "=" * 70) + print("[测试3] OCR文本解析 (parse_medical_data_v2)") + print("=" * 70) + + # 模拟典型的百度OCR提取文本(英文报告格式) + sample_ocr_text = """Page 1 +Patient Name: MR. TEST PATIENT +Sex : Male Age : 45Y +Collected Date/Time: 20 Jan 2025 + +Complete Blood Count +Total WBC............... 6.50 *10^3/mm3 (4.0-10.0) +Red Blood Cell.......... 4.69 *10^6/mm3 (4.5-5.5) +Hemoglobin(Hb)......... 14.2 g/dL (13.0-17.0) +Hematocrit(HCT)........ 41.3 % (40-54) +MCV.................... 88.1 fL (80-100) +MCH.................... 30.3 pg (27-34) +MCHC................... 34.4 g/dL (32-36) +Platelet Count......... 230 *10^3/mm3 (150-400) +Neutrophil............. 62.3 % (40-70) +Lymphocyte............. 28.5 % (20-40) +Monocyte............... 6.2 % (2-8) +Eosinophil............. 2.5 % (1-6) +Basophil............... 0.5 % (0-1) +ESR 1 Hour............. 8 mm/hr (0-15) + +Liver Function +ALT(Alanine Transaminase)...... 25 U/L (0-41) +AST(Aspartate Transaminase).... 22 U/L (0-40) +GGT( Gamma GT)................. 30 U/L (8-61) +ALP(Alkaline Phosphatase)...... 70 U/L (40-130) +Total Bilirubin................ 0.8 mg/dL (0.1-1.2) +Direct Bilirubin............... 0.2 mg/dL (0-0.3) +Total Protein.................. 7.2 g/dL (6.6-8.3) +Albumin........................ 4.5 g/dL (3.5-5.2) +Globulin....................... 2.7 g/dL (2.0-3.5) + +Kidney Function +BUN............................ 15 mg/dL (6-20) +Creatinine..................... 0.95 mg/dL (0.67-1.17) +Uric Acid...................... 5.8 mg/dL (3.4-7.0) +eGFR........................... 92 mL/min (>90) + +Lipid Profile +Total Cholesterol.............. 195 mg/dL (<200) +Triglyceride................... 120 mg/dL (<150) +HDL-Cholesterol................ 55 mg/dL (>40) +LDL-Cholesterol(Direct)........ 118 mg/dL (<100) + +Glucose(Fasting)............... 95 mg/dL (74-100) +HbA1c.......................... 5.7 % (4.0-5.6) + +Thyroid Function +TSH............................ 2.15 mIU/L (0.27-4.2) +Free T3........................ 3.2 pg/mL (2.0-4.4) +Free T4........................ 1.25 ng/dL (0.93-1.7) + +Hormones +Estradiol(E2).................. 28.5 pg/mL (11.3-43.2) +Testosterone................... 450 ng/dL (249-836) +Cortisol....................... 12.5 ug/dL (6.2-19.4) +FSH............................ 5.85 mIU/mL (1.5-12.4) +LH(Luteinizing Hormone)....... 4.2 mIU/mL (1.7-8.6) +Prolactin...................... 8.5 ng/mL (4.0-15.2) +DHEA-Sulphate.................. 280 ug/dL (88.9-427) +IGF-1.......................... 165 ng/mL (101-267) + +Tumor Markers +AFP(Alpha Fetoprotein)......... 3.2 ng/mL (0-7) +CEA(Carcinoembryonic Antigen).. 2.1 ng/mL (0-5) +Total PSA...................... 0.8 ng/mL (0-4) +CA125.......................... 12.5 U/mL (0-35) + +Coagulation +Prothrombin Time(PT)........... 12.5 sec (10-14) +APTT........................... 28.3 sec (25-35) +Thrombin Time(TT).............. 16.2 sec (14-21) +Fibrinogen..................... 2.8 g/L (2.0-4.0) +INR............................ 0.93 (0.8-1.2) + +Infectious Disease +HBsAg(Hepatitis B Surface Antigen)... Negative +HBsAb(Hepatitis B Surface Antibody).. Positive +HCV Ab (Hepatitis C Antibody)........ Non Reactive +HIV-1/HIV-2 Antibody................. Non Reactive +RPR (Rapid Plasma Reagin)............ Non Reactive + +Electrolytes +Sodium......................... 140 mmol/L (136-145) +Potassium...................... 4.2 mmol/L (3.5-5.1) +Chloride....................... 103 mmol/L (98-107) +Calcium........................ 9.5 mg/dL (8.6-10.2) + +Immune Function +Immunoglobulin G(IgG).......... 1050 mg/dL (700-1600) +Immunoglobulin A(IgA).......... 220 mg/dL (70-400) +Immunoglobulin M(IgM).......... 95 mg/dL (40-230) +Complement C3(B1C)............. 110 mg/dL (90-180) +Complement C4.................. 28 mg/dL (10-40) +C-Reactive Protein(High Sens).. 0.5 mg/L (<3) + +Bone Metabolism +N-mid Osteocalcin.............. 15.2 ng/mL (14-46) +PTH(Intact).................... 35 pg/mL (15-65) +Vitamin D(25-OH Vitamin D Total) 32 ng/mL (30-100) + +Blood Type +ABO Group...................... A +Rh Group....................... Positive + +Homocysteine................... 10.5 umol/L (5-15) + +Vitamin B12.................... 450 pg/mL (197-771) +Folate......................... 12.3 ng/mL (>3.0) +""" + + items = parse_medical_data_v2(sample_ocr_text, "test_sample.pdf") + items = clean_extracted_data_v2(items) + + print(f" 解析出 {len(items)} 个检测项") + + # 期望至少能解析出的关键ABB + expected_abbs = { + 'WBC', 'RBC', 'Hb', 'HCT', 'MCV', 'MCH', 'MCHC', 'PLT', + 'NEUT', 'LYMPH', 'MONO', 'EOS', 'BAS', 'ESR', + 'ALT', 'AST', 'GGT', 'ALP', 'TBil', 'DBil', 'TP', 'ALB', 'GLB', + 'BUN', 'Scr', 'UA', 'eGFR', + 'TC', 'TG', 'HDL', 'LDL', + 'FBS', 'HbA1C', + 'TSH', 'FT3', 'FT4', + 'E2', 'T', 'COR', 'FSH', 'LH', 'PRL', 'DHEAS', 'IGF-1', + 'AFP', 'CEA', 'TPSA', 'CA125', + 'PT', 'APTT', 'TT', 'FIB', 'INR', + 'HBsAg', 'HBsAb', 'HCV', 'HIV', 'TRUST', + 'Na', 'K', 'Cl', 'Ca', + 'IgG', 'IgA', 'IgM', 'C3', 'C4', 'hs-CRP', + 'OST', 'PTH', '25-OH-VD2+D3', + 'ABO', 'Rh', + 'Hcy', + 'VitB12', 'Folate', + } + + found_abbs = {item['abb'] for item in items} + matched = expected_abbs & found_abbs + missing = expected_abbs - found_abbs + extra = found_abbs - expected_abbs + + print(f" 期望 {len(expected_abbs)} 个ABB") + print(f" 匹配 {len(matched)} 个") + + if missing: + print(f" [WARN] 未匹配 {len(missing)} 个: {sorted(missing)}") + if extra: + print(f" [INFO] 额外识别 {len(extra)} 个: {sorted(extra)}") + + # 打印所有解析出的项目详情 + print(f"\n {'ABB':<15} {'结果':<12} {'标记':<4} {'单位':<20} {'参考范围'}") + print(" " + "-" * 70) + for item in sorted(items, key=lambda x: x['abb']): + abb = item['abb'] + result = item.get('result', '')[:10] + point = item.get('point', '') + unit = item.get('unit', '')[:18] + ref = item.get('reference', '')[:25] + marker = "✓" if abb in expected_abbs else " " + print(f" {marker} {abb:<13} {result:<12} {point:<4} {unit:<20} {ref}") + + coverage = len(matched) / len(expected_abbs) * 100 if expected_abbs else 0 + print(f"\n 覆盖率: {coverage:.1f}% ({len(matched)}/{len(expected_abbs)})") + return coverage >= 70 # 至少70%覆盖率算通过 + + +# ============================================================ +# 测试4: 分类 + 模板匹配联合测试 +# ============================================================ +def test_classify_with_template(): + """测试提取数据经过分类后能否正确归入模块""" + print("\n" + "=" * 70) + print("[测试4] 分类 → 模板匹配联合测试") + print("=" * 70) + + # 加载真实配置 + config_path = os.path.join(os.path.dirname(__file__), "abb_mapping_config.json") + if not os.path.exists(config_path): + print(" [SKIP] 配置文件不存在") + return True + + with open(config_path, 'r', encoding='utf-8') as f: + config = json.load(f) + + # 模拟提取的数据(混合英文ABB和中文项目名) + mock_items = [ + {"abb": "WBC", "project": "White Blood Cell", "result": "6.5", "point": "", "unit": "*10^3/mm3", "reference": "(4.0-10.0)", "source": "test.pdf"}, + {"abb": "ALT", "project": "Alanine Aminotransferase", "result": "25", "point": "", "unit": "U/L", "reference": "(0-41)", "source": "test.pdf"}, + {"abb": "TC", "project": "Total Cholesterol", "result": "195", "point": "", "unit": "mg/dL", "reference": "(<200)", "source": "test.pdf"}, + {"abb": "TSH", "project": "TSH", "result": "2.15", "point": "", "unit": "mIU/L", "reference": "(0.27-4.2)", "source": "test.pdf"}, + {"abb": "AFP", "project": "Alpha Fetoprotein", "result": "3.2", "point": "", "unit": "ng/mL", "reference": "(0-7)", "source": "test.pdf"}, + {"abb": "E2", "project": "Estradiol", "result": "28.5", "point": "", "unit": "pg/mL", "reference": "", "source": "test.pdf"}, + {"abb": "PT", "project": "Prothrombin Time", "result": "12.5", "point": "", "unit": "sec", "reference": "(10-14)", "source": "test.pdf"}, + {"abb": "HBsAg", "project": "HBsAg", "result": "Negative", "point": "", "unit": "", "reference": "", "source": "test.pdf"}, + {"abb": "Na", "project": "Sodium", "result": "140", "point": "", "unit": "mmol/L", "reference": "(136-145)", "source": "test.pdf"}, + {"abb": "Hcy", "project": "Homocysteine", "result": "10.5", "point": "", "unit": "umol/L", "reference": "(5-15)", "source": "test.pdf"}, + ] + + matched = match_with_template(mock_items, config) + print(f"\n 模板匹配结果: {len(matched)} 个项目") + + # 检查每个项目分类 + for abb in ['WBC', 'ALT', 'TC', 'TSH', 'AFP', 'E2', 'PT', 'HBsAg', 'Na', 'Hcy']: + data = matched.get(abb, {}) + project = data.get('project', '?') + result = data.get('result', '?') + module = classify_abb_module(abb, project, api_key=None) + print(f" {abb:<8} result={result:<10} → [{module}]") + + return len(matched) >= 8 + + +# ============================================================ +# 测试5: 边界情况 - 关键词冲突 +# ============================================================ +def test_keyword_conflicts(): + """测试潜在的关键词冲突场景""" + print("\n" + "=" * 70) + print("[测试5] 关键词冲突/边界测试") + print("=" * 70) + + test_cases = [ + # 长关键词应优先于短关键词 + ("X200", "红细胞沉降速率测定", "Immune Function"), # 不应匹配到 CBC 的 '红细胞' + ("X201", "红细胞计数", "Complete Blood Count"), # 应正常匹配 '红细胞' + # 白蛋白 vs 白细胞 + ("X202", "血清白蛋白", "Liver Function"), # '白蛋白' → Liver + ("X203", "白细胞分类计数", "Complete Blood Count"), # '白细胞' → CBC + # 甲状腺 vs 甲状旁腺 + ("X204", "甲状旁腺激素检测", "Bone Metabolism"), # '甲状旁腺' → Bone + ("X205", "甲状腺功能五项", "Thyroid"), # '甲状腺' → Thyroid + # 维生素D归属 + ("X206", "25-羟维生素d总量", "Bone Metabolism"), # '维生素d' → Bone (非Vitamin) + # 尿酸 不应匹配 尿液 + ("X207", "血清尿酸", "Kidney Function"), # '尿酸' → Kidney + # 胆固醇 不应匹配 胆红素 + ("X208", "总胆固醇", "Lipid Panel"), # '胆固醇' → Lipid + ("X209", "总胆红素", "Liver Function"), # '胆红素' → Liver + # 免疫缺陷病毒 + ("X210", "人类免疫缺陷病毒抗体", "Infectious Disease"), # 不应匹配 '免疫球蛋白' + ] + + passed = 0 + failed = 0 + for abb, project, expected in test_cases: + result = classify_abb_module(abb, project, api_key=None) + status = "OK" if result == expected else "FAIL" + if result == expected: + passed += 1 + else: + failed += 1 + icon = "✓" if status == "OK" else "✗" + print(f" {icon} {project:<25} 期望: {expected:<20} 实际: {result}") + + print(f"\n 结果: {passed} 通过, {failed} 失败 / 共 {len(test_cases)} 项") + return failed == 0 + + +# ============================================================ +# 主函数 +# ============================================================ +def main(): + print("=" * 70) + print(" 医疗数据提取逻辑测试") + print(" (不调用OCR/DeepSeek API,纯本地离线测试)") + print("=" * 70) + + results = {} + results["ABB硬编码映射"] = test_abb_mapping() + results["中文关键词匹配"] = test_chinese_keyword_matching() + results["OCR文本解析"] = test_parse_ocr_text() + results["分类+模板匹配"] = test_classify_with_template() + results["关键词冲突检测"] = test_keyword_conflicts() + + # 汇总 + print("\n" + "=" * 70) + print(" 测试汇总") + print("=" * 70) + all_pass = True + for name, passed in results.items(): + icon = "✓ PASS" if passed else "✗ FAIL" + print(f" {icon} {name}") + if not passed: + all_pass = False + + print("=" * 70) + if all_pass: + print(" 所有测试通过!") + else: + print(" 存在失败项,请检查上方详情") + print("=" * 70) + + return 0 if all_pass else 1 + + +if __name__ == "__main__": + sys.exit(main()) diff --git a/backend/test_ocr_and_classify.py b/backend/test_ocr_and_classify.py new file mode 100644 index 0000000..a062d42 --- /dev/null +++ b/backend/test_ocr_and_classify.py @@ -0,0 +1,187 @@ +# -*- coding: utf-8 -*- +""" +实际OCR提取 + 分类测试 +流程: + 1. 调用百度OCR提取 医疗报告智能体 文件夹下的PDF + 2. 用 parse_medical_data_v2 解析OCR文本 + 3. 用 classify_abb_module 对每个项目分类 + 4. 输出分类结果统计 +""" +import sys +import os +import io +import json +import time + +# 修复 Windows 终端 UTF-8 +sys.stdout = io.TextIOWrapper(sys.stdout.buffer, encoding="utf-8", errors="replace") +sys.stderr = io.TextIOWrapper(sys.stderr.buffer, encoding="utf-8", errors="replace") + +sys.path.insert(0, os.path.dirname(os.path.abspath(__file__))) + +from pathlib import Path +from dotenv import load_dotenv +load_dotenv(Path(__file__).parent / ".env") + +from parse_medical_v2 import parse_medical_data_v2, clean_extracted_data_v2 +from extract_and_fill_report import ( + extract_pdf_text, classify_abb_module, match_with_template, + extract_patient_info +) + + +def main(): + pdf_dir = Path(r"c:\Users\UI\Desktop\医疗报告\医疗报告智能体") + config_path = Path(__file__).parent / "abb_mapping_config.json" + + pdf_files = list(pdf_dir.glob("*.pdf")) + if not pdf_files: + print("[ERROR] 没有找到PDF文件") + return + + print("=" * 70) + print(" 百度OCR提取 + 分类测试") + print("=" * 70) + + # ========== 步骤1: OCR提取 ========== + all_items = [] + for pdf_file in pdf_files: + print(f"\n📄 OCR提取: {pdf_file.name} ({pdf_file.stat().st_size / 1024:.0f} KB)") + start = time.time() + text = extract_pdf_text(str(pdf_file)) + elapsed = time.time() - start + lines = [l for l in text.split('\n') if l.strip()] + print(f" ✓ OCR完成 | 耗时: {elapsed:.1f}s | 行数: {len(lines)}") + + # 保存OCR原文用于调试 + ocr_output = Path(__file__).parent / "test_ocr_raw_text.txt" + with open(ocr_output, 'w', encoding='utf-8') as f: + f.write(text) + print(f" ✓ OCR原文已保存: {ocr_output.name}") + + # 提取患者信息 + patient_info = extract_patient_info(text) + print(f"\n 患者信息:") + print(f" 姓名: {patient_info.get('name', '未提取')}") + print(f" 性别: {patient_info.get('gender', '未提取')}") + print(f" 年龄: {patient_info.get('age', '未提取')}") + + # 解析检测项 + items = parse_medical_data_v2(text, pdf_file.name) + items = clean_extracted_data_v2(items) + print(f"\n ✓ 解析出 {len(items)} 个检测项") + all_items.extend(items) + + if not all_items: + print("\n[WARN] 未提取到任何检测项,OCR结果可能为非血液检测报告") + print(" 请检查 test_ocr_raw_text.txt 查看OCR原文") + return + + # ========== 步骤2: 分类测试 ========== + print("\n" + "=" * 70) + print(f" 分类测试 ({len(all_items)} 个检测项)") + print("=" * 70) + + # 按模块分组 + by_module = {} + unclassified = [] + + for item in all_items: + abb = item.get('abb', '') + project = item.get('project', abb) + result = item.get('result', '') + module = classify_abb_module(abb, project, api_key=None) + + item['classified_module'] = module + + if module == 'Other': + unclassified.append(item) + else: + if module not in by_module: + by_module[module] = [] + by_module[module].append(item) + + # 打印每个模块的项目 + print(f"\n 分类成功: {len(all_items) - len(unclassified)} 个") + print(f" 未分类(Other): {len(unclassified)} 个") + + # 按模块显示 + print("\n" + "-" * 70) + for module, items in sorted(by_module.items()): + print(f"\n 📁 [{module}] ({len(items)} 项)") + for item in items: + abb = item.get('abb', '?') + project = item.get('project', '')[:30] + result = item.get('result', '')[:15] + point = item.get('point', '') + print(f" {abb:<15} {project:<32} = {result:<15} {point}") + + if unclassified: + print(f"\n ⚠️ [Other - 未分类] ({len(unclassified)} 项)") + for item in unclassified: + abb = item.get('abb', '?') + project = item.get('project', '')[:40] + result = item.get('result', '')[:15] + print(f" {abb:<15} {project:<42} = {result}") + + # ========== 步骤3: 模板匹配测试 ========== + print("\n" + "=" * 70) + print(" 模板匹配测试") + print("=" * 70) + + with open(config_path, 'r', encoding='utf-8') as f: + config = json.load(f) + + matched = match_with_template(all_items, config) + print(f" 模板匹配: {len(matched)} 个项目") + + # 统计 + module_count = {} + for abb, data in matched.items(): + module = data.get('module', '') + if not module: + module = classify_abb_module(abb, data.get('project', abb), api_key=None) + if module not in module_count: + module_count[module] = 0 + module_count[module] += 1 + + print("\n 模块分布:") + for module, count in sorted(module_count.items(), key=lambda x: -x[1]): + print(f" {module:<30} {count} 项") + + # ========== 汇总 ========== + print("\n" + "=" * 70) + print(" 汇总") + print("=" * 70) + total = len(all_items) + classified = total - len(unclassified) + rate = classified / total * 100 if total else 0 + print(f" 总提取项: {total}") + print(f" 分类成功: {classified} ({rate:.1f}%)") + print(f" 未分类: {len(unclassified)}") + print(f" 模块数: {len(by_module)}") + print("=" * 70) + + # 保存结果 + output_path = Path(__file__).parent / "test_ocr_classify_result.json" + save_data = { + "total_items": total, + "classified": classified, + "unclassified_count": len(unclassified), + "modules": {m: len(items) for m, items in by_module.items()}, + "items": [{ + "abb": item.get("abb", ""), + "project": item.get("project", ""), + "result": item.get("result", ""), + "point": item.get("point", ""), + "unit": item.get("unit", ""), + "module": item.get("classified_module", "Other") + } for item in all_items] + } + with open(output_path, 'w', encoding='utf-8') as f: + json.dump(save_data, f, ensure_ascii=False, indent=2) + print(f"\n 结果已保存: {output_path.name}") + + +if __name__ == "__main__": + main() diff --git a/backend/test_ocr_classify_result.json b/backend/test_ocr_classify_result.json new file mode 100644 index 0000000..2879b83 --- /dev/null +++ b/backend/test_ocr_classify_result.json @@ -0,0 +1,886 @@ +{ + "total_items": 108, + "classified": 108, + "unclassified_count": 0, + "modules": { + "Liver Function": 17, + "Urine Detection": 1, + "Complete Blood Count": 24, + "Kidney Function": 2, + "Lipid Panel": 6, + "Glucose": 3, + "Immune Function": 11, + "Homocysteine": 1, + "Thyroid": 8, + "Infectious Disease": 6, + "Bone Metabolism": 5, + "Vitamin": 10, + "Tumor Markers": 14 + }, + "items": [ + { + "abb": "ALB", + "project": "白蛋白", + "result": "10", + "point": "", + "unit": "mg/L", + "module": "Liver Function" + }, + { + "abb": "pH", + "project": "酸碱度", + "result": "6.5", + "point": "", + "unit": "", + "module": "Urine Detection" + }, + { + "abb": "WBC", + "project": "白细胞计数(WBC)", + "result": "5.1", + "point": "", + "unit": "x10^9/L", + "module": "Complete Blood Count" + }, + { + "abb": "NEUT%", + "project": "中性粒细胞百分率(NEUT%)", + "result": "43.9", + "point": "", + "unit": "%", + "module": "Complete Blood Count" + }, + { + "abb": "LYMPH%", + "project": "淋巴细胞百分率(LYMPH%)", + "result": "45.7", + "point": "", + "unit": "%", + "module": "Complete Blood Count" + }, + { + "abb": "MONO%", + "project": "单核细胞百分率(MONO%)", + "result": "7.5", + "point": "", + "unit": "%", + "module": "Complete Blood Count" + }, + { + "abb": "EOS%", + "project": "嗜酸性粒细胞百分率(EO%)", + "result": "2.3", + "point": "", + "unit": "%", + "module": "Complete Blood Count" + }, + { + "abb": "BAS%", + "project": "嗜碱性粒细胞百分率(BASO%)", + "result": "0.6", + "point": "", + "unit": "%", + "module": "Complete Blood Count" + }, + { + "abb": "NEUT", + "project": "中性粒细胞数(NEUT#)", + "result": "2.3", + "point": "", + "unit": "x10^9/L", + "module": "Complete Blood Count" + }, + { + "abb": "LYMPH", + "project": "淋巴细胞数(LYMPH#)", + "result": "2.4", + "point": "", + "unit": "x10^9/L", + "module": "Complete Blood Count" + }, + { + "abb": "MONO", + "project": "单核细胞数(MONO#)", + "result": "0.39", + "point": "", + "unit": "x10^9/L", + "module": "Complete Blood Count" + }, + { + "abb": "EOS", + "project": "嗜酸性粒细胞数(EO#)", + "result": "0.12", + "point": "", + "unit": "x10^9/L", + "module": "Complete Blood Count" + }, + { + "abb": "BAS", + "project": "嗜碱性粒细胞数(BASO#)", + "result": "0.03", + "point": "", + "unit": "x10^9/L", + "module": "Complete Blood Count" + }, + { + "abb": "RBC", + "project": "红细胞计数(RBC)", + "result": "3.77", + "point": "↓", + "unit": "x10^12/L", + "module": "Complete Blood Count" + }, + { + "abb": "Hb", + "project": "血红蛋白量(HGB)", + "result": "123", + "point": "↓", + "unit": "g/L", + "module": "Complete Blood Count" + }, + { + "abb": "HCT", + "project": "红细胞比积(HCT)", + "result": "38", + "point": "↓", + "unit": "%", + "module": "Complete Blood Count" + }, + { + "abb": "MCV", + "project": "平均红细胞体积(MCV)", + "result": "100", + "point": "", + "unit": "fL", + "module": "Complete Blood Count" + }, + { + "abb": "MCH", + "project": "平均红细胞血红蛋白量(MCH)", + "result": "33", + "point": "", + "unit": "pg", + "module": "Complete Blood Count" + }, + { + "abb": "MCHC", + "project": "平均红细胞血红蛋白浓度(MCHC)", + "result": "326", + "point": "", + "unit": "g/L", + "module": "Complete Blood Count" + }, + { + "abb": "RDW-SD", + "project": "红细胞分布宽度-标准差(RDW-SD)", + "result": "45", + "point": "", + "unit": "fL", + "module": "Complete Blood Count" + }, + { + "abb": "RDW", + "project": "红细胞分布宽度-变异系数(RDW-CV)", + "result": "12.0", + "point": "", + "unit": "%", + "module": "Complete Blood Count" + }, + { + "abb": "PLT", + "project": "血小板计数(PLT)", + "result": "163", + "point": "", + "unit": "x10^9/L", + "module": "Complete Blood Count" + }, + { + "abb": "PCT", + "project": "血小板比积(PCT)", + "result": "0.18", + "point": "", + "unit": "%", + "module": "Complete Blood Count" + }, + { + "abb": "MPV", + "project": "平均血小板体积(MPV)", + "result": "10.9", + "point": "", + "unit": "fL", + "module": "Complete Blood Count" + }, + { + "abb": "PDW", + "project": "血小板分布宽度(PDW)", + "result": "16.0", + "point": "", + "unit": "fL", + "module": "Complete Blood Count" + }, + { + "abb": "P-LCR", + "project": "大型血小板比率(P-LCR)", + "result": "31", + "point": "", + "unit": "%", + "module": "Complete Blood Count" + }, + { + "abb": "TBil", + "project": "总胆红素", + "result": "8.3", + "point": "", + "unit": "umol/L", + "module": "Liver Function" + }, + { + "abb": "DBil", + "project": "直接胆红素", + "result": "1.7", + "point": "", + "unit": "umol/L", + "module": "Liver Function" + }, + { + "abb": "IBil", + "project": "间接胆红素", + "result": "6.6", + "point": "", + "unit": "umol/L", + "module": "Liver Function" + }, + { + "abb": "TP", + "project": "总蛋白", + "result": "72.4", + "point": "", + "unit": "g/L", + "module": "Liver Function" + }, + { + "abb": "ALB", + "project": "白蛋白", + "result": "44.8", + "point": "", + "unit": "g/L", + "module": "Liver Function" + }, + { + "abb": "GLB", + "project": "球蛋白", + "result": "27.6", + "point": "", + "unit": "g/L", + "module": "Liver Function" + }, + { + "abb": "A/G", + "project": "白球比值", + "result": "1.6", + "point": "", + "unit": "", + "module": "Liver Function" + }, + { + "abb": "CHE", + "project": "胆碱酯酶", + "result": "290", + "point": "", + "unit": "U/L", + "module": "Liver Function" + }, + { + "abb": "ALT", + "project": "谷丙转氨酶", + "result": "16", + "point": "", + "unit": "U/L", + "module": "Liver Function" + }, + { + "abb": "AST", + "project": "谷草转氨酶", + "result": "25", + "point": "", + "unit": "U/L", + "module": "Liver Function" + }, + { + "abb": "GGT", + "project": "γ-谷氨酰基转移酶", + "result": "22", + "point": "", + "unit": "U/L", + "module": "Liver Function" + }, + { + "abb": "ALP", + "project": "碱性磷酸酶", + "result": "61", + "point": "", + "unit": "U/L", + "module": "Liver Function" + }, + { + "abb": "Tf", + "project": "转铁蛋白", + "result": "2.43", + "point": "", + "unit": "g/L", + "module": "Liver Function" + }, + { + "abb": "Tf", + "project": "转铁蛋白", + "result": "43.57", + "point": "", + "unit": "mg/L", + "module": "Liver Function" + }, + { + "abb": "CysC", + "project": "胱抑素C", + "result": "0.90", + "point": "", + "unit": "mg/L", + "module": "Kidney Function" + }, + { + "abb": "β2-MG", + "project": "血清β2微球蛋白", + "result": "1.8", + "point": "", + "unit": "mg/L", + "module": "Kidney Function" + }, + { + "abb": "ALB", + "project": "白蛋白", + "result": "1.0", + "point": "", + "unit": "mg/L", + "module": "Liver Function" + }, + { + "abb": "GLB", + "project": "球蛋白", + "result": "70", + "point": "", + "unit": "ug/L", + "module": "Liver Function" + }, + { + "abb": "TG", + "project": "甘油三酯", + "result": "1.37", + "point": "", + "unit": "mmol/L", + "module": "Lipid Panel" + }, + { + "abb": "TC", + "project": "总胆固醇", + "result": "4.67", + "point": "", + "unit": "mmol/L", + "module": "Lipid Panel" + }, + { + "abb": "HDL", + "project": "高密度脂蛋白胆固醇", + "result": "1.52", + "point": "", + "unit": "mmol/L", + "module": "Lipid Panel" + }, + { + "abb": "LDL", + "project": "低密度脂蛋白胆固醇", + "result": "2.50", + "point": "", + "unit": "mmol/L", + "module": "Lipid Panel" + }, + { + "abb": "FFA", + "project": "游离脂肪酸", + "result": "0.66", + "point": "", + "unit": "mmol/L", + "module": "Lipid Panel" + }, + { + "abb": "INS", + "project": "胰岛素(空腹)", + "result": "8.3", + "point": "", + "unit": "μU/ml", + "module": "Glucose" + }, + { + "abb": "FBS", + "project": "葡萄糖(空腹)", + "result": "5.41", + "point": "", + "unit": "mmol/L", + "module": "Glucose" + }, + { + "abb": "HbA1C", + "project": "糖化血红蛋白", + "result": "5.6", + "point": "", + "unit": "%", + "module": "Glucose" + }, + { + "abb": "CK", + "project": "肌酸激酶", + "result": "136", + "point": "", + "unit": "U/L", + "module": "Immune Function" + }, + { + "abb": "CK-MB", + "project": "肌酸激酶同工酶MB", + "result": "11", + "point": "", + "unit": "U/L", + "module": "Immune Function" + }, + { + "abb": "hs-CRP", + "project": "超敏C反应蛋白", + "result": "0.5", + "point": "", + "unit": "mg/L", + "module": "Immune Function" + }, + { + "abb": "Hcy", + "project": "同型半胱氨酸", + "result": "9.7", + "point": "", + "unit": "umol/L", + "module": "Homocysteine" + }, + { + "abb": "Lp(a)", + "project": "脂蛋白(a)", + "result": "26", + "point": "", + "unit": "mg/L", + "module": "Lipid Panel" + }, + { + "abb": "Tg", + "project": "甲状腺球蛋白", + "result": "8.8", + "point": "", + "unit": "ng/ml", + "module": "Thyroid" + }, + { + "abb": "T3", + "project": "三碘甲状腺原氨酸T3", + "result": "1.31", + "point": "", + "unit": "nmol/L", + "module": "Thyroid" + }, + { + "abb": "T4", + "project": "甲状腺素T4", + "result": "99.0", + "point": "", + "unit": "nmol/L", + "module": "Thyroid" + }, + { + "abb": "FT3", + "project": "游离三碘甲状腺原氨酸FT3", + "result": "4.35", + "point": "", + "unit": "pmol/L", + "module": "Thyroid" + }, + { + "abb": "FT4", + "project": "游离甲状腺素FT4", + "result": "14.30", + "point": "", + "unit": "pmol/L", + "module": "Thyroid" + }, + { + "abb": "TSH", + "project": "促甲状腺素TSH", + "result": "1.55", + "point": "", + "unit": "mIU/L", + "module": "Thyroid" + }, + { + "abb": "TgAb", + "project": "抗甲状腺球蛋白抗体", + "result": "16.5", + "point": "", + "unit": "IU/ml", + "module": "Thyroid" + }, + { + "abb": "TPO-Ab", + "project": "抗甲状腺过氧化物酶抗体", + "result": "13.1", + "point": "", + "unit": "IU/ml", + "module": "Thyroid" + }, + { + "abb": "PGI", + "project": "胃蛋白酶原I", + "result": "98.4", + "point": "", + "unit": "ng/ml", + "module": "Immune Function" + }, + { + "abb": "G-17", + "project": "胃泌素-17", + "result": "2.9", + "point": "", + "unit": "pmol/L", + "module": "Immune Function" + }, + { + "abb": "PGII", + "project": "胃蛋白酶原Ⅱ", + "result": "11.1", + "point": "", + "unit": "ng/ml", + "module": "Immune Function" + }, + { + "abb": "PGR", + "project": "胃蛋白酶原比值", + "result": "8.9", + "point": "", + "unit": "", + "module": "Immune Function" + }, + { + "abb": "HBsAg", + "project": "乙肝表面抗原", + "result": "0.87", + "point": "", + "unit": "COI", + "module": "Infectious Disease" + }, + { + "abb": "HBsAb", + "project": "乙肝表面抗体", + "result": "<2.00", + "point": "", + "unit": "IU/L", + "module": "Infectious Disease" + }, + { + "abb": "HBeAg", + "project": "乙肝e抗原", + "result": "0.10", + "point": "", + "unit": "COI", + "module": "Infectious Disease" + }, + { + "abb": "HBeAb", + "project": "乙肝e抗体", + "result": "1.40", + "point": "", + "unit": "COI", + "module": "Infectious Disease" + }, + { + "abb": "HBcAb", + "project": "乙肝核心抗体", + "result": "0.01", + "point": "", + "unit": "COI", + "module": "Infectious Disease" + }, + { + "abb": "HBcAb", + "project": "乙肝核心抗体", + "result": "阳性", + "point": "", + "unit": "", + "module": "Infectious Disease" + }, + { + "abb": "CRP", + "project": "C反应蛋白", + "result": "0.5", + "point": "", + "unit": "mg/L", + "module": "Immune Function" + }, + { + "abb": "ASO", + "project": "抗链球菌溶血素\"0\"", + "result": "32", + "point": "", + "unit": "IU/ml", + "module": "Immune Function" + }, + { + "abb": "ANA", + "project": "抗核抗体", + "result": "0.9", + "point": "", + "unit": "AU/ml", + "module": "Immune Function" + }, + { + "abb": "RF", + "project": "类风湿因子", + "result": "5", + "point": "", + "unit": "IU/ml", + "module": "Immune Function" + }, + { + "abb": "PTH", + "project": "甲状旁腺素", + "result": "5.9", + "point": "", + "unit": "pmol/L", + "module": "Bone Metabolism" + }, + { + "abb": "OST", + "project": "骨钙素", + "result": "15.4", + "point": "", + "unit": "ng/ml", + "module": "Bone Metabolism" + }, + { + "abb": "VitB12", + "project": "维生素B12", + "result": "497", + "point": "", + "unit": "pmol/L", + "module": "Vitamin" + }, + { + "abb": "Fer", + "project": "血清铁蛋白", + "result": "86", + "point": "", + "unit": "ng/ml", + "module": "Vitamin" + }, + { + "abb": "Folate", + "project": "维生素B9(叶酸)血药浓度测定", + "result": "12.18", + "point": "", + "unit": "ng/ml", + "module": "Vitamin" + }, + { + "abb": "25-OH-VD2+D3", + "project": "25-羟基维生素D血药浓度测定", + "result": "19.91", + "point": "↓", + "unit": "ng/ml", + "module": "Bone Metabolism" + }, + { + "abb": "VitA", + "project": "维生素A血药浓度测定", + "result": "564.54", + "point": "", + "unit": "ng/ml", + "module": "Vitamin" + }, + { + "abb": "VD3", + "project": "25-羟基维生素D3血药浓度测定", + "result": "19.63", + "point": "", + "unit": "ng/ml", + "module": "Bone Metabolism" + }, + { + "abb": "VD2", + "project": "25-羟基维生素D2血药浓度测定", + "result": "0.28", + "point": "", + "unit": "ng/ml", + "module": "Bone Metabolism" + }, + { + "abb": "VitE", + "project": "维生素E血药浓度测定", + "result": "9.29", + "point": "", + "unit": "ug/ml", + "module": "Vitamin" + }, + { + "abb": "VitB2", + "project": "维生素B2血药浓度测定", + "result": "7.90", + "point": "", + "unit": "ng/ml", + "module": "Vitamin" + }, + { + "abb": "VitB1", + "project": "维生素B1血药浓度测定", + "result": "1.67", + "point": "↓", + "unit": "ng/ml", + "module": "Vitamin" + }, + { + "abb": "VitB5", + "project": "维生素B5血药浓度测定", + "result": "50.25", + "point": "", + "unit": "ng/ml", + "module": "Vitamin" + }, + { + "abb": "VitB3", + "project": "维生素B3血药浓度测定", + "result": "29.62", + "point": "", + "unit": "ng/ml", + "module": "Vitamin" + }, + { + "abb": "VitB6", + "project": "维生素B6血药浓度测定", + "result": "4.67", + "point": "↓", + "unit": "ng/ml", + "module": "Vitamin" + }, + { + "abb": "AFP", + "project": "甲胎蛋白", + "result": "0.5", + "point": "", + "unit": "ng/ml", + "module": "Tumor Markers" + }, + { + "abb": "CEA", + "project": "癌胚抗原", + "result": "1.3", + "point": "", + "unit": "ng/ml", + "module": "Tumor Markers" + }, + { + "abb": "CA19-9", + "project": "糖类抗原19-9", + "result": "9.9", + "point": "", + "unit": "U/ml", + "module": "Tumor Markers" + }, + { + "abb": "CA72-4", + "project": "糖类抗原72-4", + "result": "2.6", + "point": "", + "unit": "U/ml", + "module": "Tumor Markers" + }, + { + "abb": "CA24-2", + "project": "糖类抗原24-2", + "result": "7.3", + "point": "", + "unit": "U/ml", + "module": "Tumor Markers" + }, + { + "abb": "CA50", + "project": "糖类抗原50", + "result": "8.0", + "point": "", + "unit": "U/ml", + "module": "Tumor Markers" + }, + { + "abb": "CA125", + "project": "糖类抗原125", + "result": "4.9", + "point": "", + "unit": "U/ml", + "module": "Tumor Markers" + }, + { + "abb": "NSE", + "project": "神经元特异性烯醇化酶", + "result": "2.6", + "point": "", + "unit": "ng/ml", + "module": "Tumor Markers" + }, + { + "abb": "CYFRA21-1", + "project": "细胞角蛋白19片段", + "result": "1.6", + "point": "", + "unit": "ng/ml", + "module": "Tumor Markers" + }, + { + "abb": "ProGRP", + "project": "胃泌素释放肽前体", + "result": "30.0", + "point": "", + "unit": "pg/ml", + "module": "Tumor Markers" + }, + { + "abb": "SCC", + "project": "鳞状细胞癌相关抗原", + "result": "2.32", + "point": "", + "unit": "ng/ml", + "module": "Tumor Markers" + }, + { + "abb": "TPSA", + "project": "总前列腺特异抗原", + "result": "0.48", + "point": "", + "unit": "ng/ml", + "module": "Tumor Markers" + }, + { + "abb": "FPSA", + "project": "游离前列腺特异抗原", + "result": "0.25", + "point": "", + "unit": "ng/ml", + "module": "Tumor Markers" + }, + { + "abb": "F/TPSA", + "project": "游离PSA/总PSA", + "result": "0.52", + "point": "", + "unit": "", + "module": "Tumor Markers" + } + ] +} \ No newline at end of file diff --git a/backend/test_ocr_raw_text.txt b/backend/test_ocr_raw_text.txt new file mode 100644 index 0000000..92a00d6 --- /dev/null +++ b/backend/test_ocr_raw_text.txt @@ -0,0 +1,876 @@ +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证320902196601223511 检查日期20250418 +一、血压检查 +检查日期 20250418 检查医师 浦湘菊 +检查名称 检查结果 参考值 单位 +血压(坐姿收缩压) 123 90-140 /mmHg +血压(坐姿舒张压) 77 60-90 /mmHg +科室小结 +未见明显异常 +二、内科检查 +检查日期 20250418 检查医师 杨素芳 +检查名称 检查结果 参考值 单位 +既往病史 * 既往病史:高血压 +* 既往病史:尿酸高 +心音 正常 +心律 心律齐 +肝脾 肋下未及 +肺及呼吸道 未见明显异常 +皮肤、浅表淋巴结 无肿大,无压痛 +医生建议 无 +腹部 平软,无包块,无压痛 +科室小结 +1、既往病史:高血压 +2、既往病史:尿酸高 +三、外科检查 +检查日期 20250418 检查医师 游洋 +检查名称 检查结果 参考值 单位 +肛门 拒检指检 +前列腺 拒检指检 +皮肤 未见明显异常 +泌尿生殖 拒检泌尿生殖 +疝 无 +浅表淋巴结 无肿大,无压痛 +甲状腺 无肿大 +打印日期20250428 第1页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证 320902196601223511 检查日期20250418 +脊柱 未见明显异常 +四肢关节 活动自如 +外科其它 未见明显异常 +医生建议 无 +科室小结 +未见明显异常 +四、耳鼻喉科检查 +检查日期 20250418 检查医师 许万云 +检查名称 检查结果 参考值 单位 +听力 正常 +耳廓 未见明显异常 +外耳道 未见明显异常 +鼓膜 未见明显异常 +鼻中隔 * 鼻中隔糜烂 +鼻疾 未见明显异常 +其它疾病 无 +扁桃体 未见明显异常 +咽 未见明显异常 +口腔粘膜 未见明显异常 +医生建议 无 +科室小结 +1、鼻中隔糜烂 +五、心电图检查 +检查日期 20250418 检查医师 陈娟 +检查名称 检查结果 参考值 单位 +P-R间期 正常 +P波 正常 +QRS 正常 +ST段 正常 +T波 正常 +传导 正常 +心律 正常 +医生建议 无 +打印日期20250428 第2页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证 320902196601223511 检查日期20250418 +科室小结 +未见明显异常 +六、眼科检查 +检查日期 20250418 检查医师 孙建宁 +检查名称 检查结果 参考值 单位 +光学相干断层成像(OCT) 未见明显异常 +裂隙灯检查 * 双眼白内障初期 +外眼 未见明显异常 +眼底 * 双眼视网膜动脉硬化Ⅰ +左眼眼压 12.0 8.000-21.000 mmHg +右眼眼压 13.0 8.000-21.000 mmHg +医生建议 无 +科室小结 +1、双眼白内障初期 +2、双眼视网膜动脉硬化I +七、口腔科检查 +检查日期 20250418 检查医师 陆泽锋 +检查名称 检查结果 参考值 单位 +龋齿 无 +牙周 * 中度牙周炎 +口腔粘膜 正常 +其它 未见明显异常 +医生建议 * 全口牙周治疗 +科室小结 +1、中度牙周炎 +2、全口牙周治疗 +八、尿液分析检查 +检查日期 20250418 检查医师 叶安安 +检查名称 检查结果 参考值 单位 +颜色 深黄色 +透明度 清亮 +胆红素 阴性 +打印日期20250428 第3页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证320902196601223511 检查日期20250418 +尿胆原 正常 阴性-弱阳性 +酮体 阴性 +葡萄糖 阴性 +隐血 阴性 +亚硝酸盐 阴性 +白细胞酯酶 阴性 +蛋白质 阴性 +白蛋白 10 <=20 mg/L +肌酐 200 mg/dl +蛋白质肌酐比值 阴性 阴性 +白蛋白肌酐比值 阴性 阴性 +酸碱度 6.5 4.5-8 +比重 1.013 1.003-1.03 +渗透压 382 338-1039 m0sm/kg +电导率 11.1 3.1000-39.0000 mS/cm +红细胞 1.3 <=19 /ul +白细胞 1.2 <=24 /ul +白细胞团 0.0 <=23 /ul +上皮细胞 1.1 <=31 /ul +鳞状上皮 0.8 <=31 /ul +非鳞状上皮 0.2 <=4.1 /ul +管型 0.00 <=1 /ul +透明管型 0.00 <=1 /ul +病理管型 0.00 <=1 /ul +细菌 0.0 <=1200 /ul +类酵母菌 0.0 <=1 /ul +结晶 0.0 <=10 /ul +精子 0.0 <=1 /ul +粘液丝 0.00 <=1 /ul +红细胞形态信息 未提示 +尿路感染信息 未提示 +细菌信息 未提示 +科室小结 +未见明显异常 +九、血型检查 +检查日期 20250424 检查医师 刘亚东 +检查名称 检查结果 参考值 单位 +打印日期20250428 第4页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证320902196601223511 检查日期20250418 +ABO血型 A型 +Rh(D)血型 阳性 +科室小结 +未见明显异常 +十、血常规检查 +检查日期 20250418 检查医师 卞成思 +检查名称 检查结果 参考值 单位 +白细胞计数(WBC) 5.1 3.5-9.5 x10^9/L +中性粒细胞百分率(NEUT%) 43.9 40-75 % +淋巴细胞百分率(LYMPH%) 45.7 20-50 % +单核细胞百分率(MONO%) 7.5 3-10 % +嗜酸性粒细胞百分率(EO%) 2.3 0.4-8 % +嗜碱性粒细胞百分率(BASO%) 0.6 <=1 % +中性粒细胞数(NEUT#) 2.3 1.8-6.3 x10^9/L +淋巴细胞数(LYMPH#) 2.4 1.1-3.2 x10^9/L +单核细胞数(MONO#) 0.39 0.1-0.6 x10^9/L +嗜酸性粒细胞数(EO#) 0.12 0.02-0.52 x10^9/L +嗜碱性粒细胞数(BASO#) 0.03 <=0.06 x10^9/L +红细胞计数(RBC) 3.77 4.3-5.8 x10^12/L +血红蛋白量(HGB) 123 130-175 g/L +红细胞比积(HCT) ↓ 38 40-50 % +平均红细胞体积(MCV) 100 82-100 fL +平均红细胞血红蛋白量(MCH) 33 27-34 pg +平均红细胞血红蛋白浓度(MCHC) 326 316-354 g/L +红细胞分布宽度-标准差(RDW-SD) 45 37-50 fL +红细胞分布宽度-变异系数(RDW-CV) 12.0 11.6-14.4 % +血小板计数(PLT) 163 125-350 x10^9/L +血小板比积(PCT) 0.18 0.17-0.35 % +平均血小板体积(MPV) 10.9 9-13 fL +血小板分布宽度(PDW) 16.0 9-17 fL +大型血小板比率(P-LCR) 31 13-43 % +科室小结 +1、红细胞计数降低 +2、血红蛋白量降低 +3、红细胞比积偏低 +打印日期20250428 第5页,共26页 +健康管理体检报告 +姓名姚友胜 性别男 体检单号1125041700091 年龄59 +身份证320902196601223511 检查日期20250418 +十一、肝功能检查 +检查日期 20250419 检查医师 冯瑞祥 +检查名称 检查结果 参考值 单位 +总胆红素 8.3 3-26 umol/L +直接胆红素 1.7 <=7 umol/L +间接胆红素 6.6 1.7-17 umol/L +总蛋白 72.4 65-85 g/L +白蛋白 44.8 40-55 g/L +球蛋白 27.6 20-40 g/L +白球比值 1.6 1.2-2.4 +胆碱酯酶 290 203-460 U/L +谷丙转氨酶 16 9-50 U/L +谷草转氨酶 25 15-40 U/L +γ-谷氨酰基转移酶 22 10-60 U/L +碱性磷酸酶 61 45-125 U/L +转铁蛋白 2.43 2.00-3.60 g/L +糖缺失性转铁蛋白 43.57 25.80-65.70 mg/L +糖缺失性转铁蛋白百分率 1.79 1.06-2.60 % +科室小结 +未见明显异常 +十二、肾功能检查 +检查日期 20250419 检查医师 冯瑞祥 +检查名称 检查结果 参考值 单位 +尿素 5.5 3.1-8.0 mmol/L +肌酐 90 57-97 umol/L +尿酸 285 202-416 umol/L +胱抑素C 0.90 0.55-1.05 mg/L +血清β2微球蛋白 1.8 1.0-2.3 mg/L +尿肌酐 11064 mmol/L +尿微量白蛋白 1.0 <=20 mg/L +尿微量白蛋白尿肌酐比值 0.8 <=30 mg/g +尿β2微球蛋白 70 <=300 ug/L +尿β2微球蛋白尿肌酐比值 56 <=200 ug/g +科室小结 +未见明显异常 +打印日期20250428 第6页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证320902196601223511 检查日期20250418 +十三、血脂检查 +检查日期 20250419 检查医师 冯瑞祥 +检查名称 检查结果 参考值 单位 +甘油三酯 1.37 0.45-1.69 mmol/L +总胆固醇 4.67 2.33-5.17 mmol/L +高密度脂蛋白胆固醇 1.52 0.91-2.06 mmol/L +低密度脂蛋白胆固醇 2.50 2.07-3.36 mmol/L +游离脂肪酸 0.66 0.1-0.77 mmol/L +科室小结 +未见明显异常 +十四、血糖及相关检查 +检查日期 20250421 检查医师 彭丹亚 +检查名称 检查结果 参考值 单位 +胰岛素(空腹) 8.3 2.6-24.9 μU/ml +葡萄糖(空腹) 5.41 3.9-6.1 mmol/L +糖化血红蛋白 5.6 4.0-6.5 % +科室小结 +未见明显异常 +十五、心肌酶检查 +检查日期 20250419 检查医师 冯瑞祥 +检查名称 检查结果 参考值 单位 +肌酸激酶 136 50-310 U/L +肌酸激酶同工酶MB 11 <=24 U/L +科室小结 +未见明显异常 +十六、心血管病风险因子检查 +检查日期 20250419 检查医师 冯瑞祥 +检查名称 检查结果 参考值 单位 +超敏C反应蛋白 0.5 <=3.0 mg/L +打印日期20250428 第7页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证320902196601223511 检查日期20250418 +同型半胱氨酸 9.7 <=15 umol/L +脂蛋白(a) 26 <=300 mg/L +科室小结 +未见明显异常 +十七、甲状腺功能检查 +检查日期 20250421 检查医师 彭丹亚 +检查名称 检查结果 参考值 单位 +甲状腺球蛋白 8.8 3.5-77.0 ng/ml +三碘甲状腺原氨酸T3 1.31 1.3-2.4 nmol/L +甲状腺素T4 99.0 70-140 nmol/L +游离三碘甲状腺原氨酸FT3 4.35 3.82-6.30 pmol/L +游离甲状腺素FT4 14.30 12.80-21.30 pmol/L +促甲状腺素TSH 1.55 0.75-5.60 mIU/L +抗甲状腺球蛋白抗体 16.5 <115.0 IU/ml +抗甲状腺过氧化物酶抗体 13.1 <34.0 IU/ml +科室小结 +未见明显异常 +十八、胃功能检查 +检查日期 20250421 检查医师 陆朝阳 +检查名称 检查结果 参考值 单位 +胃蛋白酶原I 98.4 >=30 ng/ml +胃泌素-17 2.9 1.7-7.6 pmol/L +胃蛋白酶原Ⅱ 11.1 ng/ml +胃蛋白酶原比值 8.9 >=3 +科室小结 +未见明显异常 +十九、感染标志物检查 +检查日期 20250424 检查医师 陈瑾 +检查名称 检查结果 参考值 单位 +EB病毒DNA 未检出 100 IU/ml +乙肝表面抗原 0.87 <1.0 COI +打印日期20250428 第8页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证320902196601223511 检查日期20250418 +乙肝表面抗体 <2.00 <10.0 IU/L +乙肝e抗原 0.10 <1.0 COI +乙肝e抗体 1.40 >1.0 COI +乙肝核心抗体 0.01 >1.0 COI +科室小结 +1、乙肝核心抗体阳性 +二十、风湿病检查 +检查日期 20250419 检查医师 冯瑞祥 +检查名称 检查结果 参考值 单位 +C反应蛋白 0.5 <=6.0 mg/L +抗链球菌溶血素"0" 32 <=160 IU/ml +抗核抗体 0.9 <40.0 AU/ml +类风湿因子 5 <=20 IU/ml +科室小结 +未见明显异常 +二十一、电解质检查 +检查日期 20250424 检查医师 柏玉 +检查名称 检查结果 参考值 单位 +钾 3.99 3.5-5.5 mmol/L +钠 139.3 135-145 mmol/L +氯 105.7 98--108 mmol/L +总钙 2.42 2.11-2.52 mmol/L +磷 1.04 0.85-1.51 mmol/L +科室小结 +未见明显异常 +二十二、骨矿代谢检查 +检查日期 20250421 检查医师 彭丹亚 +检查名称 检查结果 参考值 单位 +甲状旁腺素 5.9 1.6-6.9 pmol/L +骨钙素 15.4 5.58-28.62 ng/ml +打印日期20250428 第9页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证320902196601223511 检查日期20250418 +科室小结 +未见明显异常 +二十三、贫血检查 +检查日期 20250421 检查医师 彭丹亚 +检查名称 检查结果 参考值 单位 +维生素B12 497 145-569 pmol/L +血清铁蛋白 86 31.3-408.5 ng/ml +科室小结 +未见明显异常 +二十四、维生素测定检查 +检查日期 20250424 检查医师 赵娴 +检查名称 检查结果 参考值 单位 +维生素B9(叶酸)血药浓度测定 12.18 >4 ng/ml +25-羟基维生素D血药浓度测定 19.91 30-100 ng/ml +维生素A血药浓度测定 564.54 325-780 ng/ml +25-羟基维生素D3血药浓度测定 19.63 无参考范围 ng/ml +25-羟基维生素D2血药浓度测定 0.28 无参考范围 ng/ml +维生素E血药浓度测定 9.29 5--18 ug/ml +维生素K1血药浓度测定 f 3.40 0.13--1.88 ng/ml +维生素B2血药浓度测定 7.90 2.33-14.69 ng/ml +维生素B1血药浓度测定 ↓ 1.67 2.4-9.02 ng/ml +维生素B5血药浓度测定 50.25 12.9-253.1 ng/ml +维生素B3血药浓度测定 29.62 5.2-72.1 ng/ml +维生素B6血药浓度测定 4.67 4.9-30.9 ng/ml +科室小结 +1、25-羟基维生素D血药浓度测定偏低 +2、维生素K1血药浓度测定偏高 +3、维生素B1血药浓度测定偏低 +4、维生素B6血药浓度测定偏低 +二十五、肿瘤标志物检查 +检查日期 20250421 检查医师 陆朝阳 +检查名称 检查结果 参考值 单位 +打印日期20250428 第10页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证320902196601223511 检查日期20250418 +甲胎蛋白 0.5 <=7.0 ng/ml +癌胚抗原 1.3 <=5 ng/ml +糖类抗原19-9 9.9 <=30 U/ml +糖类抗原72-4 2.6 <6.9 U/ml +糖类抗原24-2 7.3 <20.0 U/ml +糖类抗原50 8.0 <25.0 U/ml +糖类抗原125 4.9 <=24 U/ml +神经元特异性烯醇化酶 2.6 <16.3 ng/ml +细胞角蛋白19片段 1.6 <3.3 ng/ml +胃泌素释放肽前体 30.0 28.3-74.4 pg/ml +鳞状细胞癌相关抗原 2.32 <=2.7 ng/ml +总前列腺特异抗原 0.48 <=4 ng/ml +游离前列腺特异抗原 0.25 <=0.93 ng/ml +游离PSA/总PSA 0.52 0.25-1 +科室小结 +未见明显异常 +二十六、碳十三呼气试验检查 +检查日期 20250418 检查医师 张小芳 +检查名称 检查结果 参考值 单位 +碳13尿素呼气试验结果 阴性 阴性 +碳13尿素呼气试验DOB值 0.73 <4.0 +科室小结 +未见明显异常 +二十七、心脏彩色超声检查 +检查日期 20250419 检查医师 董静 +检查名称 检查结果 参考值 单位 +心脏彩色超声检查所见 心脏各房、室腔内径正常,主动脉窦部内径正常。各瓣膜形态、回 +声及开放活动未见明显异常。房间隔及室间隔连续未见中断。左 +室壁厚度正常,静息状态下未见明显节段性左室壁运动异常。心包 +及心包腔未见明显异常。CDFI:二、三尖瓣房侧及主动脉瓣下可见 +返流束。二尖瓣口舒张期血流频谱E峰大于A峰。组织多普勒显 +像(TDI):二尖瓣环E'/A'小于1。 +心脏彩色超声检查提示 * 主动脉瓣返流(轻微) +* 二尖瓣返流(轻度) +打印日期20250428 第11页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证 320902196601223511 检查日期20250418 +* 三尖瓣返流(轻度) +* 左室松弛性异常 +左室收缩功能正常 +心脏穿透 健康服务中心 EPIQTC TIS0.6MI1.4 2025/04/18 07:57:35 心脏穿透 健康服务中心 EPIQTC TIS1.2 MI1.1 07:59.54 +S5-1 45Hz 18cm S5-1 18cm M1M4 +20 20 75% C50 +C50 HRes ,: HRes +彩色血流 4000Hz +WF 309H +1.63.2 1.63.2 小® +心脏穿透 健康服务中心 EPIQTC TIS1.2 MI 1.1 08:00.54 +S5-1 21Hz 18cm M1M4 +20 C50 +HRes ,: +彩色血流 4000Hz +WESRCL Ervs +1.63.2 ® +科室小结 +1、主动脉瓣返流(轻微) +2、二尖瓣返流(轻度) +3、三尖瓣返流(轻度) +4、左室松弛性异常 +二十八、腹部彩色超声检查 +检查日期 20250419 检查医师 傅宁华 +检查名称 检查结果 参考值 单位 +腹部彩色超声检查所见 肝脏:肝脏大小形态正常,包膜光整,肝内管系走向清晰。于肝 +内可见数个无回声区,较大约23*22mm,边界清,后方回声增强。 +胆囊:胆囊切除术后,胆总管内径约8mm,显示段未见明显异常回 +声。肾脏:双侧肾脏大小形态正常,皮髓质分界清晰,集合系统 +未见分离。左侧肾脏上极可见一个无回声区,壁薄,边界清,后 +方回声增强,大小约39*39mm。左侧肾脏内可见数个强回声光团, +较大直径约8mm,后方伴声影。右侧肾脏内可见数个强回声光团, +较大直径约7mm,后方伴声影。胰腺:胰腺大小形态正常,边界清 +晰,内部回声均匀,其内未见明显异常,主胰管未见明显扩张。 +打印日期20250428 第12页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证320902196601223511 检查日期20250418 +脾脏:脾脏大小形态正常,回声均匀,其内未见明显异常。 +腹部彩色超声检查提示 * 肝囊肿 +* 胆囊切除术后 +* 左侧肾囊肿 +* 双侧多发性肾结石 +胰腺声像图未见异常 +脾脏声像图未见异常 +MI1.2Tls 0.6 C1-6 DlanYo LlaoYangYuar +-AO% -AO +L 0.84 cm +545:550130.7:31.1 +25/04/1807:39:37 +-AO% -AO% +L0.73 cm 3.90 cm +291:294(16.4:16.5 914:915(51.1:51.1 +DlanYo LlaoYangYua +L 0.88 cm +243:252(13.7:14.1 +5/04/1807:42:00 Place the last poin +科室小结 +1、肝囊肿 +2、胆囊切除术后 +3、左侧肾囊肿 +4、双侧多发性肾结石 +打印日期20250428 第13页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证320902196601223511 检查日期20250418 +二十九、甲状腺彩色超声检查 +检查日期 20250419 检查医师 傅宁华 +检查名称 检查结果 参考值 单位 +甲状腺彩色超声检查所见 甲状腺切面形态大小正常,表面光滑,包膜完整,内部回声均匀, +其内未见明显异常回声,CDFI示血流信号未见异常。 +甲状腺彩色超声检查提示 甲状腺声像图未见异常 +DlanYo LlaoYangYuar 25/04/1807:36:32 MI 0.7 Tls 0.5 ML6-15 +Generi +IIAO% 3/16 +科室小结 +未见明显异常 +三十、颈动脉彩色超声检查 +检查日期 20250419 检查医师 傅宁华 +检查名称 检查结果 参考值 单位 +颈动脉彩色超声检查所见 双侧颈动脉走行及内径正常,内中膜光滑,厚度范围正常,未见 +明显斑块形成。CDFI示管腔内血流充盈良好,血流速度及频谱形 +态未见明显异常。 +颈动脉彩色超声检查提示 双侧颈动脉声像图及多普勒血流频谱未见明显异常 +DlanYo LlaoYangYua 25004/18.07-37-34 MI 0.7 Tls 0.5 ML6-15 +A/B Ratio +科室小结 +未见明显异常 +打印日期20250428 第14页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证320902196601223511 检查日期20250418 +三十一、前列腺彩色超声检查 +检查日期 20250419 检查医师 傅宁华 +检查名称 检查结果 参考值 单位 +前列腺彩色超声检查所见 膀胱:膀胱不充盈。前列腺:经腹壁检查,前列腺大小约 +46*27mm,形态尚规则,包膜光整,回声欠均匀。 +前列腺彩色超声检查提示 * 前列腺轻度增生 +DlanYo LlaoYangYua MI 1.2Tls 0.6 C1-6 +科室小结 +1、前列腺轻度增生 +三十二、经颅多普勒彩色超声检查 +检查日期 20250418 检查医师 李倩 +检查名称 检查结果 参考值 单位 +TCD提示 * 双侧椎动脉流速减慢 +科室小结 +1、双侧椎动脉流速减慢 +三十三、人体成分分析检查 +检查日期 20250418 检查医师 庞燕 +检查名称 检查结果 参考值 单位 +体型评估 肥胖 +身高 165.3 公分(cm) +体重 65.6 公斤(Kg) +BMI 24.0 18.50-23.99 +科室小结 +1、体质指数偏高 +打印日期20250428 第15页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证320902196601223511 检查日期20250418 +三十四、骨密度检查 +检查日期 20250418 检查医师 周丹 +检查名称 检查结果 参考值 单位 +骨密度检查结论 * 骨质减少 +科室小结 +1、骨质减少 +三十五、肺功能检查 +检查日期 20250418 检查医师 汪向雨 +检查名称 检查结果 参考值 单位 +肺功能检查 通气功能大致正常 +科室小结 +未见明显异常 +三十六、胸部CT检查 +检查日期 20250420 检查医师 鞠兵 +检查名称 检查结果 参考值 单位 +胸部CT检查所见 胸廓对称,肋骨及胸壁软组织未见异常。肺窗示左肺舌叶可见少 +许纤维条索影,两肺内可见2-4mm实性、磨玻璃样小结节影,双 +肺门不大。纵隔窗示纵隔无偏移,心影及大血管形态正常,纵隔 +内未见肿块及肿大淋巴结。无胸腔积液及胸膜肥厚。扫及肝脏、 +左肾可见类圆形低密度影;胆囊缺如。双侧大脑半球对称,灰白 +质对比正常,未见局灶性密度异常,各脑室、脑池大小形态正常, +中线结构居中,幕下小脑、脑干无明确异常。 +胸部CT检查结论 * 两肺多发小结节,建议随访复查 +* 左肺舌叶少许慢性炎症 +* 肝囊肿 +* 胆囊术后,请结合相关病史 +* 左肾囊肿 +头颅CT平扫未见明显异常 +科室小结 +1、两肺多发小结节,建议随访复查 +2、左肺舌叶少许慢性炎症 +3、肝囊肿 +4、胆囊术后,请结合相关病史 +打印日期20250428 第16页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证320902196601223511 检查日期20250418 +5、左肾囊肿 +三十七、X光检查 +检查日期 20250421 检查医师 鞠兵 +检查名称 检查结果 参考值 单位 +X光检查所见 颈椎序列齐,生理曲线稍变直;颈椎椎体缘见骨质增生影,前纵 +韧带钙化;各椎间隙尚可;余未见明显异常。 +X光检查结论 * 颈椎退变 +科室小结 +1、颈椎退变 +打印日期20250428 第17页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证320902196601223511 检查日期20250418 +近3次体检项目图表 +血压(坐姿收缩压) +200 133 118 123 +最高值140 +100 结果 +最低值90 +0 +2023-05-19 2024-04-20 2025-04-18 +血压(坐姿舒张压) +87 77 +100 72 +最高值90 +50 结果 +最低值60 +0 +2023-05-19 2024-04-20 2025-04-18 +尿酸 +600 342 368 +285 最高值416 +400 结果 +200 最低值202 +0 +2023-05-19 2024-04-20 2025-04-18 +甘油三酯 +2 1.36 1.37 +0.96 最高值1.69 +1 结果 +最低值0.45 +0 +2023-05-19 2024-04-20 2025-04-18 +总胆固醇 +6 4.55 4.19 4.67 +最高值5.17 +4 结果 +2 最低值2.33 +0 +2023-05-19 2024-04-20 2025-04-18 +打印日期20250428 第18页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证320902196601223511 检查日期20250418 +高密度脂蛋白胆固醇 +3 +1.44 1.46 1.52 最高值2.06 +2 结果 +1 最低值0.91 +0 +2023-05-19 2024-04-20 2025-04-18 +低密度脂蛋白胆固醇 +4 2.61 2.33 2.5 +最高值3.36 +2 结果 +最低值2.07 +0 +2023-05-19 2024-04-20 2025-04-18 +胰岛素(空腹) +30 +最高值24.9 +20 7.8 8.3 结果 +10 5 +最低值2.6 +0 +2023-05-19 2024-04-20 2025-04-18 +葡萄糖(空腹) +10 +5.77 4.96 5.41 最高值6.1 +5 结果 +最低值3.9 +0 +2023-05-19 2024-04-20 2025-04-18 +甲胎蛋白 +10 +最高值7.0 +5 结果 +0.5 0.7 0.5 最低值0 +0 +2023-05-19 2024-04-20 2025-04-18 +打印日期20250428 第19页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证320902196601223511 检查日期20250418 +癌胚抗原 +6 +最高值5 +4 1.5 结果 +1.2 1.3 +2 ---- 最低值0 +0 +2023-05-19 2024-04-20 2025-04-18 +糖类抗原19-9 +40 +最高值30 +20 9.9 结果 +5.2 3.6 最低值0 +0 +2023-05-19 2024-04-20 2025-04-18 +糖类抗原72-4 +10 +最高值6.899 +5 2 2.9 2.6 结果 +最低值0 +0 +2023-05-19 2024-04-20 2025-04-18 +糖类抗原125 +30 +最高值24 +20 7.5 结果 +6 4.9 +10 最低值0 +0 +2023-05-19 2024-04-20 2025-04-18 +总前列腺特异抗原 +6 +最高值4 +4 结果 +1.03 0.82 0.48 +2 最低值0 +0 +2023-05-19 2024-04-20 2025-04-18 +打印日期20250428 第20页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证320902196601223511 检查日期20250418 +游离前列腺特异抗原 +1 +最高值0.93 +0.5 0.32 0.25 0.25 结果 +最低值0 +0 +2023-05-19 2024-04-20 2025-04-18 +BMI +30 24.4 23.7 24 +最高值23.99 +20 结果 +10 最低值18.50 +0 +2023-05-19 2024-04-20 2025-04-18 +打印日期20250428 第21页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证320902196601223511 检查日期20250418 +血压检查 +年份 20250418 20240420 20230519 +检验项目 结果 结果 结果 +血压(坐姿收缩压) 123 118 133 +血压(坐姿舒张压) 77 72 87 +内科检查 +年份 20250418 20240420 20230519 +检验项目 结果 结果 结果 +既往病史 既往病史:高血压 既往病史:高血压 既往病史:高血压 +既往病史:尿酸高 / +心音 正常 正常 正常 +心律 心律齐 心律齐 心律齐 +/ 心动过缓每分钟52 / +次 +肝脾 肋下未及 肋下未及 肋下未及 +肺及呼吸道 未见明显异常 未见明显异常 未见明显异常 +皮肤、浅表淋巴结 无肿大,无压痛 无肿大,无压痛 无肿大,无压痛 +医生建议 无 无 无 +腹部 平软,无包块,无压 平软,无包块,无压 平软,无包块,无压 +痛 痛 痛 +胸部CT检查 +年份 20250420 20240422 20230522 +检验项目 结果 结果 结果 +胸部CT检查所见 胸廓对称,肋骨及胸 胸廓对称,肋骨及胸 胸廓对称,肋骨及胸 +壁软组织未见异常。 壁软组织未见异常。 壁软组织未见异常。 +肺窗示左肺舌叶可 肺窗示双肺纹理清 肺窗示双肺纹理清 +见少许纤维条索影, 晰,走行自然,肺野 晰,走行自然,肺野 +两肺内可见2-4mm 透光度良好,左下肺 透光度良好,左下肺 +实性、磨玻璃样小结 背段见磨玻璃微小 背段见磨玻璃微小 +节影,双肺门不大。 结节,约5*4mm;余 结节,约5*4mm;余 +纵隔窗示纵隔无偏 双肺见多枚实性微 双肺见多枚实性微 +移,心影及大血管形 小结节影,直径小于 小结节影,直径小于 +态正常,纵隔内未见 5mm。双肺门不大。 5mm。双肺门不大。 +肿块及肿大淋巴结。 纵隔窗示纵隔无偏 纵隔窗示纵隔无偏 +无胸腔积液及胸膜 移,心影及大血管形 移,心影及大血管形 +肥厚。扫及肝脏、左 态正常,纵隔内未见 态正常,纵隔内未见 +肾可见类圆形低密 肿块及肿大淋巴结。 肿块及肿大淋巴结。 +打印日期20250428 第22页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证320902196601223511 检查日期20250418 +度影;胆囊缺如。双 无胸腔积液及胸膜 无胸腔积液及胸膜 +侧大脑半球对称,灰 肥厚。肝及左肾见囊 肥厚。肝及左肾见囊 +白质对比正常,未见 状低密度影。胆囊窝 状低密度影。胆囊窝 +局灶性密度异常,各 见银夹。 见银夹。 +脑室、脑池大小形态 +正常,中线结构居中, +幕下小脑、脑干无明 +确异常。 +胸部CT检查结论 两肺多发小结节,建 左下肺背段磨玻璃 左下肺背段磨玻璃 +议随访复查 微小结节影拟为良 微小结节影拟为良 +性病变 性病变 +左肺舌叶少许慢性 两肺内散在实性微 两肺内散在实性微 +炎症 小结节影拟为陈旧 小结节影拟为陈旧 +性炎性灶 性炎性灶 +肝囊肿 肝及左肾囊肿 肝及左肾囊肿 +胆囊术后,请结合相 胆囊窝银夹拟为术 胆囊窝银夹拟为术 +关病史 后改变 后改变 +左肾囊肿 / +头颅CT平扫未见明 / +显异常 +X光检查 +年份 20250421 20240423 / +检验项目 结果 结果 结果 +X光检查所见 颈椎序列齐,生理曲 颈椎序列,生理曲度 +线稍变直;颈椎椎体 变直,部分椎体边缘 +缘见骨质增生影,前 骨质增生,前纵韧带 +纵韧带钙化;各椎间 可见小斑点状高密 +隙尚可;余未见明显 度影;椎间隙尚可, +异常。 椎小关节未见明显 +异常。余无特殊。 +X光检查结论 颈椎退变 颈椎退行性改变 +打印日期20250428 第23页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证320902196601223511 检查日期20250418 +三十八、综合诊断及建议 +1. 既往病史:高血压 +(1)健康行为指导:建议低盐饮食,每日6克左右。低脂、低胆固醇膳食,建议多食绿叶类蔬菜、鲜奶 +及水果等,限制饮酒、戒烟。注意劳逸结合,减轻精神压力,保持心理平衡。 +(2)治疗建议:您本次收缩压和舒张压都在正常范围,请保持健康生活方式,定期复查血压,心内科随 +诊。 +2. 既往病史:尿酸高 +建议您结合临床,定期复查血尿酸。 +3. 鼻中隔糜烂 +请您耳鼻喉科进一步检查治疗。 +4. 双眼白内障初期 +建议您定期复查,眼科随诊。 +5. 双眼视网膜动脉硬化I +建议您改善微循环,定期查血压。 +6. 中度牙周炎 +建议您全口牙周治疗。 +7. 红细胞计数降低,红细胞比积偏低 +常见各种贫血,建议定期复查,必要时请到血液科随诊。 +8. 血红蛋白量降低 +请您专科进一步检查贫血原因并作相应治疗。 +9. 乙肝核心抗体阳性 +乙肝病毒感染后,HBsAg已消失,HBsAb尚未出现的窗口期,传染性弱或无传染性。建议避免疲劳、避免 +饮酒,必要时专科进一步复查随诊。 +10.25-羟基维生素D血药浓度测定偏低 +25-羟基维生素D血药浓度测定偏低意味着体内的维生素D水平缺乏或不足。这可能与摄入不足、日照不 +足、肠道吸收不良、肝肾疾病等因素有关。25-羟基维生素D偏低会影响钙的吸收,增加骨骼疾病、心血 +管疾病风险,并可能影响免疫力和情绪。建议您结合临床,定期复查,必要时三甲医院专科进一步检查偏 +低原因。 +11. 维生素K1血药浓度测定偏高 +维生素K1血药浓度测定偏高可能是近期绿色蔬菜食用量较多,如菠菜等,或者是长期、大量服用了含有 +打印日期20250428 第24页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证320902196601223511 检查日期20250418 +维生素K1的保健品导致。建议结合临床,定期复查,必要时进一步检查偏高原因,专科随诊。 +12. 维生素B1血药浓度测定偏低 +维生素B1血药浓度测定偏低主要原因包括饮食摄入不足、吸收障碍、代谢需求增加等。建议结合临床, +定期复查,必要时进一步检查,明确维生素B1血药浓度测定偏低的原因,专科随诊。 +13. 维生素B6血药浓度测定偏低 +维生素B6血药浓度测定偏低主要原因包括饮食摄入不足、吸收障碍、代谢需求增加、药物影响及遗传因 +素等。建议结合临床,定期复查,适当补充维生素B6片,必要时进一步检查,明确维生素B6血药浓度测 +定偏低的原因,专科随诊。 +14. 主动脉瓣返流(轻微),二尖瓣返流(轻度),三尖瓣返流(轻度),左室松弛性异常 +建议您结合临床,定期复查,心内科随诊。 +15. B超、CT检查:肝囊肿 +根据您此次检查结果,建议年度复查。肝囊肿如果直径>100mm,并有腹部胀痛者可专科穿刺或手术治疗。 +16.B超:胆囊切除术后。CT:胆囊术后 +建议您定期B超复查专科随诊。 +17.B超、CT检查:左侧肾囊肿 +根据您此次检查结果,建议定期复查,泌尿外科随诊。肾囊肿如果直径>50mm应考虑泌尿外科治疗。 +18.双侧多发性肾结石 +(1)肾结石是因为尿液中钙、草酸、尿酸等成石物质浓度升高,在肾内析出结晶并积聚且逐渐增大而成。 +(2)建议平时多饮水,多食蔬菜水果,不饮浓茶。高尿酸者应避免高嘌呤食物如动物内脏、蛤蟹类、豆 +制品类、啤酒、海鲜等。 +(3)建议结合临床,如有腹痛或腰痛等不适症状时,请去泌尿外科就诊。 +19. 前列腺轻度增生 +无症状者无需处理,定期复查。 +20. 双侧椎动脉流速减慢 +建议您结合临床,定期复查,心内科随诊。 +21. 体质指数偏高 +建议您改善生活方式,饮食控制,适当运动,以减轻体重。 +22. 骨质减少 +建议多食用含钙量高的食物如牛奶、虾等,同时可补充钙剂及维生素D等以促进钙的吸收,适量锻炼,防 +止摔倒,多晒太阳,骨科随诊。 +打印日期20250428 第25页,共26页 +健康管理体检报告 +姓名 姚友胜 性别男 体检单号1125041700091 年龄59 +身份证320902196601223511 检查日期20250418 +23. 两肺多发小结节;左肺舌叶少许慢性炎症 +建议您结合临床,定期复查,呼吸科随诊。 +24. 颈椎退变 +1.请结合临床,年度复查。 +2.有头晕或颈部、上肢有僵麻感等症状时需及时去医院就诊,可选择针灸,理疗或牵引等方法治疗。 +3.平时注意避免过分疲劳,并选用合适高度和弧度的枕头,注意颈部适度活动及保暖。 +备注: +1.“结合临床,定期复查”或“定期复查”指如无明显身体不适症状者,一般6-12个月复查;如出现身体 +不适症状情况请及时去医院就诊。 +2.“随诊”说明您身体有了病变趋势,需要您高度重视,如出现身体不适症状情况,请及时去医院进一步 +检查、就诊。 +3.“近期复查”是指一个月内复查。 +主检医师:买晓配 +打印日期20250428 第26页,共26页 \ No newline at end of file diff --git a/backend/test_step3.docx b/backend/test_step3.docx new file mode 100644 index 0000000..ff08905 Binary files /dev/null and b/backend/test_step3.docx differ diff --git a/backend/xml_safe_save.py b/backend/xml_safe_save.py new file mode 100644 index 0000000..7603c5a --- /dev/null +++ b/backend/xml_safe_save.py @@ -0,0 +1,177 @@ +""" +安全保存模块 - 使用 lxml 精确处理 XML 元素 +""" +import zipfile +import shutil +import os +import re +from pathlib import Path +from lxml import etree + + +def safe_save(doc, output_path, template_path): + """ + 安全保存 - 使用 lxml 精确处理 XML + + 策略: + 1. 先保存文档到临时文件 + 2. 使用 lxml 解析 XML + 3. 从模板复制前四页元素(到 Client Health Program 为止) + 4. 从处理后文件复制 Client Health Program 之后的所有内容 + 5. 合并并保存 + """ + import tempfile + + output_path = Path(output_path) + template_path = Path(template_path) + + ns = {'w': 'http://schemas.openxmlformats.org/wordprocessingml/2006/main'} + + temp_fd, temp_path = tempfile.mkstemp(suffix='.docx') + os.close(temp_fd) + + try: + # 1. 保存到临时文件 + doc.save(temp_path) + + # 2. 读取模板 XML + with zipfile.ZipFile(template_path, 'r') as z: + template_xml = z.read('word/document.xml') + template_tree = etree.fromstring(template_xml) + template_body = template_tree.find('.//w:body', ns) + + # 3. 读取处理后 XML + with zipfile.ZipFile(temp_path, 'r') as z: + modified_xml = z.read('word/document.xml') + modified_tree = etree.fromstring(modified_xml) + modified_body = modified_tree.find('.//w:body', ns) + + if template_body is None or modified_body is None: + print(" [安全保存] 无法解析 XML body") + shutil.copy(temp_path, output_path) + return + + template_children = list(template_body) + modified_children = list(modified_body) + + # 4. 找到模板中的保护边界(Client Health Program 之后) + boundary_pos = -1 + for i, elem in enumerate(template_children): + text = ''.join(elem.itertext()).strip() + if 'Client Health Program' in text or '客户健康方案' in text: + boundary_pos = i + 1 # 包括这个元素 + break + + if boundary_pos < 0: + # 默认使用 80 个元素 + boundary_pos = min(80, len(template_children)) + + # 5. 找到处理后文件中的数据起始位置 + # 关键修改:从 Client Health Program 之后开始,而不是从 health report analysis 开始 + # 这样可以保留 Functional Medical Health Advice 等内容 + data_start_pos = -1 + + # 首先尝试找 Client Health Program 的位置 + for i, elem in enumerate(modified_children): + text = ''.join(elem.itertext()).strip() + if 'Client Health Program' in text or '客户健康方案' in text: + data_start_pos = i + 1 # 从 Client Health Program 之后开始 + print(f" [安全保存] 找到 Client Health Program 位置: {i}") + break + + # 如果找不到,使用备用关键词 + if data_start_pos < 0: + start_keywords = ['health report analysis', '健康报告分析', + 'abnormal index', '异常指标', + 'functional medical health advice', '功能医学健康建议', + 'urine detection', '尿液检测'] + + for i, elem in enumerate(modified_children): + text = ''.join(elem.itertext()).strip().lower() + if any(kw in text for kw in start_keywords): + data_start_pos = i + break + + if data_start_pos < 0: + data_start_pos = boundary_pos + + print(f" [安全保存] 边界位置:{boundary_pos}, 数据起始:{data_start_pos}") + + # 6. 清空模板 body,重新构建 + # 保存模板的 sectPr 元素(包含页脚引用) + sectPr = None + for elem in template_children: + if elem.tag.endswith('}sectPr'): + sectPr = etree.fromstring(etree.tostring(elem)) + break + + # 清空 body + for elem in list(template_body): + template_body.remove(elem) + + # 7. 添加模板的前 boundary_pos 个元素(前四页) + # 重新读取模板以获取原始元素 + with zipfile.ZipFile(template_path, 'r') as z: + orig_template_xml = z.read('word/document.xml') + orig_template_tree = etree.fromstring(orig_template_xml) + orig_template_body = orig_template_tree.find('.//w:body', ns) + orig_template_children = list(orig_template_body) + + protected_count = 0 + for i in range(min(boundary_pos, len(orig_template_children))): + elem = orig_template_children[i] + if elem.tag.endswith('}sectPr'): + continue + elem_copy = etree.fromstring(etree.tostring(elem)) + template_body.append(elem_copy) + protected_count += 1 + + # 8. 添加处理后文件的数据部分(从 Client Health Program 之后开始) + data_count = 0 + for i in range(data_start_pos, len(modified_children)): + elem = modified_children[i] + if elem.tag.endswith('}sectPr'): + continue + elem_copy = etree.fromstring(etree.tostring(elem)) + template_body.append(elem_copy) + data_count += 1 + + # 9. 添加 sectPr + if sectPr is not None: + template_body.append(sectPr) + + print(f" [安全保存] 保护部分:{protected_count}, 数据部分:{data_count}") + + # 10. 保存 XML + new_xml = etree.tostring(template_tree, xml_declaration=True, encoding='UTF-8', standalone='yes') + + # 11. 基于模板创建输出文件 + temp_result = str(output_path) + '.temp.docx' + with zipfile.ZipFile(template_path, 'r') as zin: + with zipfile.ZipFile(temp_result, 'w', zipfile.ZIP_DEFLATED) as zout: + for item in zin.infolist(): + if item.filename == 'word/document.xml': + zout.writestr(item, new_xml) + else: + zout.writestr(item, zin.read(item.filename)) + + # 12. 移动到最终位置 + if output_path.exists(): + output_path.unlink() + shutil.move(temp_result, output_path) + + print(f" [安全保存] ✓ 完成") + + except Exception as e: + print(f" [安全保存] 错误: {e}") + import traceback + traceback.print_exc() + # 回退到普通保存 + doc.save(output_path) + finally: + for f in [temp_path, str(output_path) + '.temp.docx']: + if os.path.exists(f): + try: + os.remove(f) + except: + pass diff --git a/frontend/index.html b/frontend/index.html new file mode 100644 index 0000000..54fec40 --- /dev/null +++ b/frontend/index.html @@ -0,0 +1,13 @@ + + + + + + + 医疗报告分析系统 + + +

+ + + diff --git a/frontend/package-lock.json b/frontend/package-lock.json new file mode 100644 index 0000000..a957fc3 --- /dev/null +++ b/frontend/package-lock.json @@ -0,0 +1,2889 @@ +{ + "name": "medical-report-analyzer", + "version": "1.0.0", + "lockfileVersion": 3, + "requires": true, + "packages": { + "": { + "name": "medical-report-analyzer", + "version": "1.0.0", + "dependencies": { + "axios": "^1.6.0", + "lucide-react": "^0.292.0", + "react": "^18.2.0", + "react-dom": "^18.2.0" + }, + "devDependencies": { + "@types/react": "^18.2.37", + "@types/react-dom": "^18.2.15", + "@vitejs/plugin-react": "^4.2.0", + "autoprefixer": "^10.4.16", + "postcss": "^8.4.31", + "tailwindcss": "^3.3.5", + "vite": "^5.0.0" + } + }, + "node_modules/@alloc/quick-lru": { + "version": "5.2.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@alloc/quick-lru/-/quick-lru-5.2.0.tgz", + "integrity": "sha512-UrcABB+4bUrFABwbluTIBErXwvbsU/V7TZWfmbgJfbkwiBuziS9gxdODUyuiecfdGQ85jglMW6juS3+z5TsKLw==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=10" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/@babel/code-frame": { + "version": "7.27.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@babel/code-frame/-/code-frame-7.27.1.tgz", + "integrity": "sha512-cjQ7ZlQ0Mv3b47hABuTevyTuYN4i+loJKGeV9flcCgIK37cCXRh+L1bd3iBHlynerhQ7BhCkn2BPbQUL+rGqFg==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/helper-validator-identifier": "^7.27.1", + "js-tokens": "^4.0.0", + "picocolors": "^1.1.1" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/compat-data": { + "version": "7.28.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@babel/compat-data/-/compat-data-7.28.5.tgz", + "integrity": "sha512-6uFXyCayocRbqhZOB+6XcuZbkMNimwfVGFji8CTZnCzOHVGvDqzvitu1re2AU5LROliz7eQPhB8CpAMvnx9EjA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/core": { + "version": "7.28.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@babel/core/-/core-7.28.5.tgz", + "integrity": "sha512-e7jT4DxYvIDLk1ZHmU/m/mB19rex9sv0c2ftBtjSBv+kVM/902eh0fINUzD7UwLLNR+jU585GxUJ8/EBfAM5fw==", + "dev": true, + "license": "MIT", + "peer": true, + "dependencies": { + "@babel/code-frame": "^7.27.1", + "@babel/generator": "^7.28.5", + "@babel/helper-compilation-targets": "^7.27.2", + "@babel/helper-module-transforms": "^7.28.3", + "@babel/helpers": "^7.28.4", + "@babel/parser": "^7.28.5", + "@babel/template": "^7.27.2", + "@babel/traverse": "^7.28.5", + "@babel/types": "^7.28.5", + "@jridgewell/remapping": "^2.3.5", + "convert-source-map": "^2.0.0", + "debug": "^4.1.0", + "gensync": "^1.0.0-beta.2", + "json5": "^2.2.3", + "semver": "^6.3.1" + }, + "engines": { + "node": ">=6.9.0" + }, + "funding": { + "type": "opencollective", + "url": "https://opencollective.com/babel" + } + }, + "node_modules/@babel/generator": { + "version": "7.28.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@babel/generator/-/generator-7.28.5.tgz", + "integrity": "sha512-3EwLFhZ38J4VyIP6WNtt2kUdW9dokXA9Cr4IVIFHuCpZ3H8/YFOl5JjZHisrn1fATPBmKKqXzDFvh9fUwHz6CQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/parser": "^7.28.5", + "@babel/types": "^7.28.5", + "@jridgewell/gen-mapping": "^0.3.12", + "@jridgewell/trace-mapping": "^0.3.28", + "jsesc": "^3.0.2" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/helper-compilation-targets": { + "version": "7.27.2", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@babel/helper-compilation-targets/-/helper-compilation-targets-7.27.2.tgz", + "integrity": "sha512-2+1thGUUWWjLTYTHZWK1n8Yga0ijBz1XAhUXcKy81rd5g6yh7hGqMp45v7cadSbEHc9G3OTv45SyneRN3ps4DQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/compat-data": "^7.27.2", + "@babel/helper-validator-option": "^7.27.1", + "browserslist": "^4.24.0", + "lru-cache": "^5.1.1", + "semver": "^6.3.1" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/helper-globals": { + "version": "7.28.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@babel/helper-globals/-/helper-globals-7.28.0.tgz", + "integrity": "sha512-+W6cISkXFa1jXsDEdYA8HeevQT/FULhxzR99pxphltZcVaugps53THCeiWA8SguxxpSp3gKPiuYfSWopkLQ4hw==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/helper-module-imports": { + "version": "7.27.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@babel/helper-module-imports/-/helper-module-imports-7.27.1.tgz", + "integrity": "sha512-0gSFWUPNXNopqtIPQvlD5WgXYI5GY2kP2cCvoT8kczjbfcfuIljTbcWrulD1CIPIX2gt1wghbDy08yE1p+/r3w==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/traverse": "^7.27.1", + "@babel/types": "^7.27.1" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/helper-module-transforms": { + "version": "7.28.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@babel/helper-module-transforms/-/helper-module-transforms-7.28.3.tgz", + "integrity": "sha512-gytXUbs8k2sXS9PnQptz5o0QnpLL51SwASIORY6XaBKF88nsOT0Zw9szLqlSGQDP/4TljBAD5y98p2U1fqkdsw==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/helper-module-imports": "^7.27.1", + "@babel/helper-validator-identifier": "^7.27.1", + "@babel/traverse": "^7.28.3" + }, + "engines": { + "node": ">=6.9.0" + }, + "peerDependencies": { + "@babel/core": "^7.0.0" + } + }, + "node_modules/@babel/helper-plugin-utils": { + "version": "7.27.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@babel/helper-plugin-utils/-/helper-plugin-utils-7.27.1.tgz", + "integrity": "sha512-1gn1Up5YXka3YYAHGKpbideQ5Yjf1tDa9qYcgysz+cNCXukyLl6DjPXhD3VRwSb8c0J9tA4b2+rHEZtc6R0tlw==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/helper-string-parser": { + "version": "7.27.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@babel/helper-string-parser/-/helper-string-parser-7.27.1.tgz", + "integrity": "sha512-qMlSxKbpRlAridDExk92nSobyDdpPijUq2DW6oDnUqd0iOGxmQjyqhMIihI9+zv4LPyZdRje2cavWPbCbWm3eA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/helper-validator-identifier": { + "version": "7.28.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@babel/helper-validator-identifier/-/helper-validator-identifier-7.28.5.tgz", + "integrity": "sha512-qSs4ifwzKJSV39ucNjsvc6WVHs6b7S03sOh2OcHF9UHfVPqWWALUsNUVzhSBiItjRZoLHx7nIarVjqKVusUZ1Q==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/helper-validator-option": { + "version": "7.27.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@babel/helper-validator-option/-/helper-validator-option-7.27.1.tgz", + "integrity": "sha512-YvjJow9FxbhFFKDSuFnVCe2WxXk1zWc22fFePVNEaWJEu8IrZVlda6N0uHwzZrUM1il7NC9Mlp4MaJYbYd9JSg==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/helpers": { + "version": "7.28.4", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@babel/helpers/-/helpers-7.28.4.tgz", + "integrity": "sha512-HFN59MmQXGHVyYadKLVumYsA9dBFun/ldYxipEjzA4196jpLZd8UjEEBLkbEkvfYreDqJhZxYAWFPtrfhNpj4w==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/template": "^7.27.2", + "@babel/types": "^7.28.4" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/parser": { + "version": "7.28.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@babel/parser/-/parser-7.28.5.tgz", + "integrity": "sha512-KKBU1VGYR7ORr3At5HAtUQ+TV3SzRCXmA/8OdDZiLDBIZxVyzXuztPjfLd3BV1PRAQGCMWWSHYhL0F8d5uHBDQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/types": "^7.28.5" + }, + "bin": { + "parser": "bin/babel-parser.js" + }, + "engines": { + "node": ">=6.0.0" + } + }, + "node_modules/@babel/plugin-transform-react-jsx-self": { + "version": "7.27.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@babel/plugin-transform-react-jsx-self/-/plugin-transform-react-jsx-self-7.27.1.tgz", + "integrity": "sha512-6UzkCs+ejGdZ5mFFC/OCUrv028ab2fp1znZmCZjAOBKiBK2jXD1O+BPSfX8X2qjJ75fZBMSnQn3Rq2mrBJK2mw==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/helper-plugin-utils": "^7.27.1" + }, + "engines": { + "node": ">=6.9.0" + }, + "peerDependencies": { + "@babel/core": "^7.0.0-0" + } + }, + "node_modules/@babel/plugin-transform-react-jsx-source": { + "version": "7.27.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@babel/plugin-transform-react-jsx-source/-/plugin-transform-react-jsx-source-7.27.1.tgz", + "integrity": "sha512-zbwoTsBruTeKB9hSq73ha66iFeJHuaFkUbwvqElnygoNbj/jHRsSeokowZFN3CZ64IvEqcmmkVe89OPXc7ldAw==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/helper-plugin-utils": "^7.27.1" + }, + "engines": { + "node": ">=6.9.0" + }, + "peerDependencies": { + "@babel/core": "^7.0.0-0" + } + }, + "node_modules/@babel/template": { + "version": "7.27.2", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@babel/template/-/template-7.27.2.tgz", + "integrity": "sha512-LPDZ85aEJyYSd18/DkjNh4/y1ntkE5KwUHWTiqgRxruuZL2F1yuHligVHLvcHY2vMHXttKFpJn6LwfI7cw7ODw==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/code-frame": "^7.27.1", + "@babel/parser": "^7.27.2", + "@babel/types": "^7.27.1" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/traverse": { + "version": "7.28.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@babel/traverse/-/traverse-7.28.5.tgz", + "integrity": "sha512-TCCj4t55U90khlYkVV/0TfkJkAkUg3jZFA3Neb7unZT8CPok7iiRfaX0F+WnqWqt7OxhOn0uBKXCw4lbL8W0aQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/code-frame": "^7.27.1", + "@babel/generator": "^7.28.5", + "@babel/helper-globals": "^7.28.0", + "@babel/parser": "^7.28.5", + "@babel/template": "^7.27.2", + "@babel/types": "^7.28.5", + "debug": "^4.3.1" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@babel/types": { + "version": "7.28.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@babel/types/-/types-7.28.5.tgz", + "integrity": "sha512-qQ5m48eI/MFLQ5PxQj4PFaprjyCTLI37ElWMmNs0K8Lk3dVeOdNpB3ks8jc7yM5CDmVC73eMVk/trk3fgmrUpA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/helper-string-parser": "^7.27.1", + "@babel/helper-validator-identifier": "^7.28.5" + }, + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/@esbuild/aix-ppc64": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@esbuild/aix-ppc64/-/aix-ppc64-0.21.5.tgz", + "integrity": "sha512-1SDgH6ZSPTlggy1yI6+Dbkiz8xzpHJEVAlF/AM1tHPLsf5STom9rwtjE4hKAF20FfXXNTFqEYXyJNWh1GiZedQ==", + "cpu": [ + "ppc64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "aix" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/android-arm": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@esbuild/android-arm/-/android-arm-0.21.5.tgz", + "integrity": "sha512-vCPvzSjpPHEi1siZdlvAlsPxXl7WbOVUBBAowWug4rJHb68Ox8KualB+1ocNvT5fjv6wpkX6o/iEpbDrf68zcg==", + "cpu": [ + "arm" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "android" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/android-arm64": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@esbuild/android-arm64/-/android-arm64-0.21.5.tgz", + "integrity": "sha512-c0uX9VAUBQ7dTDCjq+wdyGLowMdtR/GoC2U5IYk/7D1H1JYC0qseD7+11iMP2mRLN9RcCMRcjC4YMclCzGwS/A==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "android" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/android-x64": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@esbuild/android-x64/-/android-x64-0.21.5.tgz", + "integrity": "sha512-D7aPRUUNHRBwHxzxRvp856rjUHRFW1SdQATKXH2hqA0kAZb1hKmi02OpYRacl0TxIGz/ZmXWlbZgjwWYaCakTA==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "android" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/darwin-arm64": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@esbuild/darwin-arm64/-/darwin-arm64-0.21.5.tgz", + "integrity": "sha512-DwqXqZyuk5AiWWf3UfLiRDJ5EDd49zg6O9wclZ7kUMv2WRFr4HKjXp/5t8JZ11QbQfUS6/cRCKGwYhtNAY88kQ==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/darwin-x64": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@esbuild/darwin-x64/-/darwin-x64-0.21.5.tgz", + "integrity": "sha512-se/JjF8NlmKVG4kNIuyWMV/22ZaerB+qaSi5MdrXtd6R08kvs2qCN4C09miupktDitvh8jRFflwGFBQcxZRjbw==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/freebsd-arm64": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@esbuild/freebsd-arm64/-/freebsd-arm64-0.21.5.tgz", + "integrity": "sha512-5JcRxxRDUJLX8JXp/wcBCy3pENnCgBR9bN6JsY4OmhfUtIHe3ZW0mawA7+RDAcMLrMIZaf03NlQiX9DGyB8h4g==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "freebsd" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/freebsd-x64": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@esbuild/freebsd-x64/-/freebsd-x64-0.21.5.tgz", + "integrity": "sha512-J95kNBj1zkbMXtHVH29bBriQygMXqoVQOQYA+ISs0/2l3T9/kj42ow2mpqerRBxDJnmkUDCaQT/dfNXWX/ZZCQ==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "freebsd" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/linux-arm": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@esbuild/linux-arm/-/linux-arm-0.21.5.tgz", + "integrity": "sha512-bPb5AHZtbeNGjCKVZ9UGqGwo8EUu4cLq68E95A53KlxAPRmUyYv2D6F0uUI65XisGOL1hBP5mTronbgo+0bFcA==", + "cpu": [ + "arm" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/linux-arm64": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@esbuild/linux-arm64/-/linux-arm64-0.21.5.tgz", + "integrity": "sha512-ibKvmyYzKsBeX8d8I7MH/TMfWDXBF3db4qM6sy+7re0YXya+K1cem3on9XgdT2EQGMu4hQyZhan7TeQ8XkGp4Q==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/linux-ia32": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@esbuild/linux-ia32/-/linux-ia32-0.21.5.tgz", + "integrity": "sha512-YvjXDqLRqPDl2dvRODYmmhz4rPeVKYvppfGYKSNGdyZkA01046pLWyRKKI3ax8fbJoK5QbxblURkwK/MWY18Tg==", + "cpu": [ + "ia32" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/linux-loong64": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@esbuild/linux-loong64/-/linux-loong64-0.21.5.tgz", + "integrity": "sha512-uHf1BmMG8qEvzdrzAqg2SIG/02+4/DHB6a9Kbya0XDvwDEKCoC8ZRWI5JJvNdUjtciBGFQ5PuBlpEOXQj+JQSg==", + "cpu": [ + "loong64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/linux-mips64el": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@esbuild/linux-mips64el/-/linux-mips64el-0.21.5.tgz", + "integrity": "sha512-IajOmO+KJK23bj52dFSNCMsz1QP1DqM6cwLUv3W1QwyxkyIWecfafnI555fvSGqEKwjMXVLokcV5ygHW5b3Jbg==", + "cpu": [ + "mips64el" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/linux-ppc64": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@esbuild/linux-ppc64/-/linux-ppc64-0.21.5.tgz", + "integrity": "sha512-1hHV/Z4OEfMwpLO8rp7CvlhBDnjsC3CttJXIhBi+5Aj5r+MBvy4egg7wCbe//hSsT+RvDAG7s81tAvpL2XAE4w==", + "cpu": [ + "ppc64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/linux-riscv64": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@esbuild/linux-riscv64/-/linux-riscv64-0.21.5.tgz", + "integrity": "sha512-2HdXDMd9GMgTGrPWnJzP2ALSokE/0O5HhTUvWIbD3YdjME8JwvSCnNGBnTThKGEB91OZhzrJ4qIIxk/SBmyDDA==", + "cpu": [ + "riscv64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/linux-s390x": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@esbuild/linux-s390x/-/linux-s390x-0.21.5.tgz", + "integrity": "sha512-zus5sxzqBJD3eXxwvjN1yQkRepANgxE9lgOW2qLnmr8ikMTphkjgXu1HR01K4FJg8h1kEEDAqDcZQtbrRnB41A==", + "cpu": [ + "s390x" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/linux-x64": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@esbuild/linux-x64/-/linux-x64-0.21.5.tgz", + "integrity": "sha512-1rYdTpyv03iycF1+BhzrzQJCdOuAOtaqHTWJZCWvijKD2N5Xu0TtVC8/+1faWqcP9iBCWOmjmhoH94dH82BxPQ==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/netbsd-x64": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@esbuild/netbsd-x64/-/netbsd-x64-0.21.5.tgz", + "integrity": "sha512-Woi2MXzXjMULccIwMnLciyZH4nCIMpWQAs049KEeMvOcNADVxo0UBIQPfSmxB3CWKedngg7sWZdLvLczpe0tLg==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "netbsd" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/openbsd-x64": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@esbuild/openbsd-x64/-/openbsd-x64-0.21.5.tgz", + "integrity": "sha512-HLNNw99xsvx12lFBUwoT8EVCsSvRNDVxNpjZ7bPn947b8gJPzeHWyNVhFsaerc0n3TsbOINvRP2byTZ5LKezow==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "openbsd" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/sunos-x64": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@esbuild/sunos-x64/-/sunos-x64-0.21.5.tgz", + "integrity": "sha512-6+gjmFpfy0BHU5Tpptkuh8+uw3mnrvgs+dSPQXQOv3ekbordwnzTVEb4qnIvQcYXq6gzkyTnoZ9dZG+D4garKg==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "sunos" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/win32-arm64": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@esbuild/win32-arm64/-/win32-arm64-0.21.5.tgz", + "integrity": "sha512-Z0gOTd75VvXqyq7nsl93zwahcTROgqvuAcYDUr+vOv8uHhNSKROyU961kgtCD1e95IqPKSQKH7tBTslnS3tA8A==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/win32-ia32": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@esbuild/win32-ia32/-/win32-ia32-0.21.5.tgz", + "integrity": "sha512-SWXFF1CL2RVNMaVs+BBClwtfZSvDgtL//G/smwAc5oVK/UPu2Gu9tIaRgFmYFFKrmg3SyAjSrElf0TiJ1v8fYA==", + "cpu": [ + "ia32" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@esbuild/win32-x64": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@esbuild/win32-x64/-/win32-x64-0.21.5.tgz", + "integrity": "sha512-tQd/1efJuzPC6rCFwEvLtci/xNFcTZknmXs98FYDfGE4wP9ClFV98nyKrzJKVPMhdDnjzLhdUyMX4PsQAPjwIw==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "win32" + ], + "engines": { + "node": ">=12" + } + }, + "node_modules/@jridgewell/gen-mapping": { + "version": "0.3.13", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@jridgewell/gen-mapping/-/gen-mapping-0.3.13.tgz", + "integrity": "sha512-2kkt/7niJ6MgEPxF0bYdQ6etZaA+fQvDcLKckhy1yIQOzaoKjBBjSj63/aLVjYE3qhRt5dvM+uUyfCg6UKCBbA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@jridgewell/sourcemap-codec": "^1.5.0", + "@jridgewell/trace-mapping": "^0.3.24" + } + }, + "node_modules/@jridgewell/remapping": { + "version": "2.3.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@jridgewell/remapping/-/remapping-2.3.5.tgz", + "integrity": "sha512-LI9u/+laYG4Ds1TDKSJW2YPrIlcVYOwi2fUC6xB43lueCjgxV4lffOCZCtYFiH6TNOX+tQKXx97T4IKHbhyHEQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "@jridgewell/gen-mapping": "^0.3.5", + "@jridgewell/trace-mapping": "^0.3.24" + } + }, + "node_modules/@jridgewell/resolve-uri": { + "version": "3.1.2", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@jridgewell/resolve-uri/-/resolve-uri-3.1.2.tgz", + "integrity": "sha512-bRISgCIjP20/tbWSPWMEi54QVPRZExkuD9lJL+UIxUKtwVJA8wW1Trb1jMs1RFXo1CBTNZ/5hpC9QvmKWdopKw==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6.0.0" + } + }, + "node_modules/@jridgewell/sourcemap-codec": { + "version": "1.5.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@jridgewell/sourcemap-codec/-/sourcemap-codec-1.5.5.tgz", + "integrity": "sha512-cYQ9310grqxueWbl+WuIUIaiUaDcj7WOq5fVhEljNVgRfOUhY9fy2zTvfoqWsnebh8Sl70VScFbICvJnLKB0Og==", + "dev": true, + "license": "MIT" + }, + "node_modules/@jridgewell/trace-mapping": { + "version": "0.3.31", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@jridgewell/trace-mapping/-/trace-mapping-0.3.31.tgz", + "integrity": "sha512-zzNR+SdQSDJzc8joaeP8QQoCQr8NuYx2dIIytl1QeBEZHJ9uW6hebsrYgbz8hJwUQao3TWCMtmfV8Nu1twOLAw==", + "dev": true, + "license": "MIT", + "dependencies": { + "@jridgewell/resolve-uri": "^3.1.0", + "@jridgewell/sourcemap-codec": "^1.4.14" + } + }, + "node_modules/@nodelib/fs.scandir": { + "version": "2.1.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@nodelib/fs.scandir/-/fs.scandir-2.1.5.tgz", + "integrity": "sha512-vq24Bq3ym5HEQm2NKCr3yXDwjc7vTsEThRDnkp2DK9p1uqLR+DHurm/NOTo0KG7HYHU7eppKZj3MyqYuMBf62g==", + "dev": true, + "license": "MIT", + "dependencies": { + "@nodelib/fs.stat": "2.0.5", + "run-parallel": "^1.1.9" + }, + "engines": { + "node": ">= 8" + } + }, + "node_modules/@nodelib/fs.stat": { + "version": "2.0.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@nodelib/fs.stat/-/fs.stat-2.0.5.tgz", + "integrity": "sha512-RkhPPp2zrqDAQA/2jNhnztcPAlv64XdhIp7a7454A5ovI7Bukxgt7MX7udwAu3zg1DcpPU0rz3VV1SeaqvY4+A==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">= 8" + } + }, + "node_modules/@nodelib/fs.walk": { + "version": "1.2.8", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@nodelib/fs.walk/-/fs.walk-1.2.8.tgz", + "integrity": "sha512-oGB+UxlgWcgQkgwo8GcEGwemoTFt3FIO9ababBmaGwXIoBKZ+GTy0pP185beGg7Llih/NSHSV2XAs1lnznocSg==", + "dev": true, + "license": "MIT", + "dependencies": { + "@nodelib/fs.scandir": "2.1.5", + "fastq": "^1.6.0" + }, + "engines": { + "node": ">= 8" + } + }, + "node_modules/@rolldown/pluginutils": { + "version": "1.0.0-beta.27", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@rolldown/pluginutils/-/pluginutils-1.0.0-beta.27.tgz", + "integrity": "sha512-+d0F4MKMCbeVUJwG96uQ4SgAznZNSq93I3V+9NHA4OpvqG8mRCpGdKmK8l/dl02h2CCDHwW2FqilnTyDcAnqjA==", + "dev": true, + "license": "MIT" + }, + "node_modules/@rollup/rollup-android-arm-eabi": { + "version": "4.53.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@rollup/rollup-android-arm-eabi/-/rollup-android-arm-eabi-4.53.3.tgz", + "integrity": "sha512-mRSi+4cBjrRLoaal2PnqH82Wqyb+d3HsPUN/W+WslCXsZsyHa9ZeQQX/pQsZaVIWDkPcpV6jJ+3KLbTbgnwv8w==", + "cpu": [ + "arm" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "android" + ] + }, + "node_modules/@rollup/rollup-android-arm64": { + "version": "4.53.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@rollup/rollup-android-arm64/-/rollup-android-arm64-4.53.3.tgz", + "integrity": "sha512-CbDGaMpdE9sh7sCmTrTUyllhrg65t6SwhjlMJsLr+J8YjFuPmCEjbBSx4Z/e4SmDyH3aB5hGaJUP2ltV/vcs4w==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "android" + ] + }, + "node_modules/@rollup/rollup-darwin-arm64": { + "version": "4.53.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@rollup/rollup-darwin-arm64/-/rollup-darwin-arm64-4.53.3.tgz", + "integrity": "sha512-Nr7SlQeqIBpOV6BHHGZgYBuSdanCXuw09hon14MGOLGmXAFYjx1wNvquVPmpZnl0tLjg25dEdr4IQ6GgyToCUA==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "darwin" + ] + }, + "node_modules/@rollup/rollup-darwin-x64": { + "version": "4.53.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@rollup/rollup-darwin-x64/-/rollup-darwin-x64-4.53.3.tgz", + "integrity": "sha512-DZ8N4CSNfl965CmPktJ8oBnfYr3F8dTTNBQkRlffnUarJ2ohudQD17sZBa097J8xhQ26AwhHJ5mvUyQW8ddTsQ==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "darwin" + ] + }, + "node_modules/@rollup/rollup-freebsd-arm64": { + "version": "4.53.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@rollup/rollup-freebsd-arm64/-/rollup-freebsd-arm64-4.53.3.tgz", + "integrity": "sha512-yMTrCrK92aGyi7GuDNtGn2sNW+Gdb4vErx4t3Gv/Tr+1zRb8ax4z8GWVRfr3Jw8zJWvpGHNpss3vVlbF58DZ4w==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "freebsd" + ] + }, + "node_modules/@rollup/rollup-freebsd-x64": { + "version": "4.53.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@rollup/rollup-freebsd-x64/-/rollup-freebsd-x64-4.53.3.tgz", + "integrity": "sha512-lMfF8X7QhdQzseM6XaX0vbno2m3hlyZFhwcndRMw8fbAGUGL3WFMBdK0hbUBIUYcEcMhVLr1SIamDeuLBnXS+Q==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "freebsd" + ] + }, + "node_modules/@rollup/rollup-linux-arm-gnueabihf": { + "version": "4.53.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@rollup/rollup-linux-arm-gnueabihf/-/rollup-linux-arm-gnueabihf-4.53.3.tgz", + "integrity": "sha512-k9oD15soC/Ln6d2Wv/JOFPzZXIAIFLp6B+i14KhxAfnq76ajt0EhYc5YPeX6W1xJkAdItcVT+JhKl1QZh44/qw==", + "cpu": [ + "arm" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-linux-arm-musleabihf": { + "version": "4.53.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@rollup/rollup-linux-arm-musleabihf/-/rollup-linux-arm-musleabihf-4.53.3.tgz", + "integrity": "sha512-vTNlKq+N6CK/8UktsrFuc+/7NlEYVxgaEgRXVUVK258Z5ymho29skzW1sutgYjqNnquGwVUObAaxae8rZ6YMhg==", + "cpu": [ + "arm" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-linux-arm64-gnu": { + "version": "4.53.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@rollup/rollup-linux-arm64-gnu/-/rollup-linux-arm64-gnu-4.53.3.tgz", + "integrity": "sha512-RGrFLWgMhSxRs/EWJMIFM1O5Mzuz3Xy3/mnxJp/5cVhZ2XoCAxJnmNsEyeMJtpK+wu0FJFWz+QF4mjCA7AUQ3w==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-linux-arm64-musl": { + "version": "4.53.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@rollup/rollup-linux-arm64-musl/-/rollup-linux-arm64-musl-4.53.3.tgz", + "integrity": "sha512-kASyvfBEWYPEwe0Qv4nfu6pNkITLTb32p4yTgzFCocHnJLAHs+9LjUu9ONIhvfT/5lv4YS5muBHyuV84epBo/A==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-linux-loong64-gnu": { + "version": "4.53.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@rollup/rollup-linux-loong64-gnu/-/rollup-linux-loong64-gnu-4.53.3.tgz", + "integrity": "sha512-JiuKcp2teLJwQ7vkJ95EwESWkNRFJD7TQgYmCnrPtlu50b4XvT5MOmurWNrCj3IFdyjBQ5p9vnrX4JM6I8OE7g==", + "cpu": [ + "loong64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-linux-ppc64-gnu": { + "version": "4.53.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@rollup/rollup-linux-ppc64-gnu/-/rollup-linux-ppc64-gnu-4.53.3.tgz", + "integrity": "sha512-EoGSa8nd6d3T7zLuqdojxC20oBfNT8nexBbB/rkxgKj5T5vhpAQKKnD+h3UkoMuTyXkP5jTjK/ccNRmQrPNDuw==", + "cpu": [ + "ppc64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-linux-riscv64-gnu": { + "version": "4.53.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@rollup/rollup-linux-riscv64-gnu/-/rollup-linux-riscv64-gnu-4.53.3.tgz", + "integrity": "sha512-4s+Wped2IHXHPnAEbIB0YWBv7SDohqxobiiPA1FIWZpX+w9o2i4LezzH/NkFUl8LRci/8udci6cLq+jJQlh+0g==", + "cpu": [ + "riscv64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-linux-riscv64-musl": { + "version": "4.53.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@rollup/rollup-linux-riscv64-musl/-/rollup-linux-riscv64-musl-4.53.3.tgz", + "integrity": "sha512-68k2g7+0vs2u9CxDt5ktXTngsxOQkSEV/xBbwlqYcUrAVh6P9EgMZvFsnHy4SEiUl46Xf0IObWVbMvPrr2gw8A==", + "cpu": [ + "riscv64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-linux-s390x-gnu": { + "version": "4.53.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@rollup/rollup-linux-s390x-gnu/-/rollup-linux-s390x-gnu-4.53.3.tgz", + "integrity": "sha512-VYsFMpULAz87ZW6BVYw3I6sWesGpsP9OPcyKe8ofdg9LHxSbRMd7zrVrr5xi/3kMZtpWL/wC+UIJWJYVX5uTKg==", + "cpu": [ + "s390x" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-linux-x64-gnu": { + "version": "4.53.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@rollup/rollup-linux-x64-gnu/-/rollup-linux-x64-gnu-4.53.3.tgz", + "integrity": "sha512-3EhFi1FU6YL8HTUJZ51imGJWEX//ajQPfqWLI3BQq4TlvHy4X0MOr5q3D2Zof/ka0d5FNdPwZXm3Yyib/UEd+w==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-linux-x64-musl": { + "version": "4.53.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@rollup/rollup-linux-x64-musl/-/rollup-linux-x64-musl-4.53.3.tgz", + "integrity": "sha512-eoROhjcc6HbZCJr+tvVT8X4fW3/5g/WkGvvmwz/88sDtSJzO7r/blvoBDgISDiCjDRZmHpwud7h+6Q9JxFwq1Q==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "linux" + ] + }, + "node_modules/@rollup/rollup-openharmony-arm64": { + "version": "4.53.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@rollup/rollup-openharmony-arm64/-/rollup-openharmony-arm64-4.53.3.tgz", + "integrity": "sha512-OueLAWgrNSPGAdUdIjSWXw+u/02BRTcnfw9PN41D2vq/JSEPnJnVuBgw18VkN8wcd4fjUs+jFHVM4t9+kBSNLw==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "openharmony" + ] + }, + "node_modules/@rollup/rollup-win32-arm64-msvc": { + "version": "4.53.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@rollup/rollup-win32-arm64-msvc/-/rollup-win32-arm64-msvc-4.53.3.tgz", + "integrity": "sha512-GOFuKpsxR/whszbF/bzydebLiXIHSgsEUp6M0JI8dWvi+fFa1TD6YQa4aSZHtpmh2/uAlj/Dy+nmby3TJ3pkTw==", + "cpu": [ + "arm64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "win32" + ] + }, + "node_modules/@rollup/rollup-win32-ia32-msvc": { + "version": "4.53.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@rollup/rollup-win32-ia32-msvc/-/rollup-win32-ia32-msvc-4.53.3.tgz", + "integrity": "sha512-iah+THLcBJdpfZ1TstDFbKNznlzoxa8fmnFYK4V67HvmuNYkVdAywJSoteUszvBQ9/HqN2+9AZghbajMsFT+oA==", + "cpu": [ + "ia32" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "win32" + ] + }, + "node_modules/@rollup/rollup-win32-x64-gnu": { + "version": "4.53.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@rollup/rollup-win32-x64-gnu/-/rollup-win32-x64-gnu-4.53.3.tgz", + "integrity": "sha512-J9QDiOIZlZLdcot5NXEepDkstocktoVjkaKUtqzgzpt2yWjGlbYiKyp05rWwk4nypbYUNoFAztEgixoLaSETkg==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "win32" + ] + }, + "node_modules/@rollup/rollup-win32-x64-msvc": { + "version": "4.53.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@rollup/rollup-win32-x64-msvc/-/rollup-win32-x64-msvc-4.53.3.tgz", + "integrity": "sha512-UhTd8u31dXadv0MopwGgNOBpUVROFKWVQgAg5N1ESyCz8AuBcMqm4AuTjrwgQKGDfoFuz02EuMRHQIw/frmYKQ==", + "cpu": [ + "x64" + ], + "dev": true, + "license": "MIT", + "optional": true, + "os": [ + "win32" + ] + }, + "node_modules/@types/babel__core": { + "version": "7.20.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@types/babel__core/-/babel__core-7.20.5.tgz", + "integrity": "sha512-qoQprZvz5wQFJwMDqeseRXWv3rqMvhgpbXFfVyWhbx9X47POIA6i/+dXefEmZKoAgOaTdaIgNSMqMIU61yRyzA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/parser": "^7.20.7", + "@babel/types": "^7.20.7", + "@types/babel__generator": "*", + "@types/babel__template": "*", + "@types/babel__traverse": "*" + } + }, + "node_modules/@types/babel__generator": { + "version": "7.27.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@types/babel__generator/-/babel__generator-7.27.0.tgz", + "integrity": "sha512-ufFd2Xi92OAVPYsy+P4n7/U7e68fex0+Ee8gSG9KX7eo084CWiQ4sdxktvdl0bOPupXtVJPY19zk6EwWqUQ8lg==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/types": "^7.0.0" + } + }, + "node_modules/@types/babel__template": { + "version": "7.4.4", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@types/babel__template/-/babel__template-7.4.4.tgz", + "integrity": "sha512-h/NUaSyG5EyxBIp8YRxo4RMe2/qQgvyowRwVMzhYhBCONbW8PUsg4lkFMrhgZhUe5z3L3MiLDuvyJ/CaPa2A8A==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/parser": "^7.1.0", + "@babel/types": "^7.0.0" + } + }, + "node_modules/@types/babel__traverse": { + "version": "7.28.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@types/babel__traverse/-/babel__traverse-7.28.0.tgz", + "integrity": "sha512-8PvcXf70gTDZBgt9ptxJ8elBeBjcLOAcOtoO/mPJjtji1+CdGbHgm77om1GrsPxsiE+uXIpNSK64UYaIwQXd4Q==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/types": "^7.28.2" + } + }, + "node_modules/@types/estree": { + "version": "1.0.8", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@types/estree/-/estree-1.0.8.tgz", + "integrity": "sha512-dWHzHa2WqEXI/O1E9OjrocMTKJl2mSrEolh1Iomrv6U+JuNwaHXsXx9bLu5gG7BUWFIN0skIQJQ/L1rIex4X6w==", + "dev": true, + "license": "MIT" + }, + "node_modules/@types/prop-types": { + "version": "15.7.15", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@types/prop-types/-/prop-types-15.7.15.tgz", + "integrity": "sha512-F6bEyamV9jKGAFBEmlQnesRPGOQqS2+Uwi0Em15xenOxHaf2hv6L8YCVn3rPdPJOiJfPiCnLIRyvwVaqMY3MIw==", + "dev": true, + "license": "MIT" + }, + "node_modules/@types/react": { + "version": "18.3.27", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@types/react/-/react-18.3.27.tgz", + "integrity": "sha512-cisd7gxkzjBKU2GgdYrTdtQx1SORymWyaAFhaxQPK9bYO9ot3Y5OikQRvY0VYQtvwjeQnizCINJAenh/V7MK2w==", + "dev": true, + "license": "MIT", + "peer": true, + "dependencies": { + "@types/prop-types": "*", + "csstype": "^3.2.2" + } + }, + "node_modules/@types/react-dom": { + "version": "18.3.7", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@types/react-dom/-/react-dom-18.3.7.tgz", + "integrity": "sha512-MEe3UeoENYVFXzoXEWsvcpg6ZvlrFNlOQ7EOsvhI3CfAXwzPfO8Qwuxd40nepsYKqyyVQnTdEfv68q91yLcKrQ==", + "dev": true, + "license": "MIT", + "peerDependencies": { + "@types/react": "^18.0.0" + } + }, + "node_modules/@vitejs/plugin-react": { + "version": "4.7.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/@vitejs/plugin-react/-/plugin-react-4.7.0.tgz", + "integrity": "sha512-gUu9hwfWvvEDBBmgtAowQCojwZmJ5mcLn3aufeCsitijs3+f2NsrPtlAWIR6OPiqljl96GVCUbLe0HyqIpVaoA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@babel/core": "^7.28.0", + "@babel/plugin-transform-react-jsx-self": "^7.27.1", + "@babel/plugin-transform-react-jsx-source": "^7.27.1", + "@rolldown/pluginutils": "1.0.0-beta.27", + "@types/babel__core": "^7.20.5", + "react-refresh": "^0.17.0" + }, + "engines": { + "node": "^14.18.0 || >=16.0.0" + }, + "peerDependencies": { + "vite": "^4.2.0 || ^5.0.0 || ^6.0.0 || ^7.0.0" + } + }, + "node_modules/any-promise": { + "version": "1.3.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/any-promise/-/any-promise-1.3.0.tgz", + "integrity": "sha512-7UvmKalWRt1wgjL1RrGxoSJW/0QZFIegpeGvZG9kjp8vrRu55XTHbwnqq2GpXm9uLbcuhxm3IqX9OB4MZR1b2A==", + "dev": true, + "license": "MIT" + }, + "node_modules/anymatch": { + "version": "3.1.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/anymatch/-/anymatch-3.1.3.tgz", + "integrity": "sha512-KMReFUr0B4t+D+OBkjR3KYqvocp2XaSzO55UcB6mgQMd3KbcE+mWTyvVV7D/zsdEbNnV6acZUutkiHQXvTr1Rw==", + "dev": true, + "license": "ISC", + "dependencies": { + "normalize-path": "^3.0.0", + "picomatch": "^2.0.4" + }, + "engines": { + "node": ">= 8" + } + }, + "node_modules/arg": { + "version": "5.0.2", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/arg/-/arg-5.0.2.tgz", + "integrity": "sha512-PYjyFOLKQ9y57JvQ6QLo8dAgNqswh8M1RMJYdQduT6xbWSgK36P/Z/v+p888pM69jMMfS8Xd8F6I1kQ/I9HUGg==", + "dev": true, + "license": "MIT" + }, + "node_modules/asynckit": { + "version": "0.4.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/asynckit/-/asynckit-0.4.0.tgz", + "integrity": "sha512-Oei9OH4tRh0YqU3GxhX79dM/mwVgvbZJaSNaRk+bshkj0S5cfHcgYakreBjrHwatXKbz+IoIdYLxrKim2MjW0Q==", + "license": "MIT" + }, + "node_modules/autoprefixer": { + "version": "10.4.22", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/autoprefixer/-/autoprefixer-10.4.22.tgz", + "integrity": "sha512-ARe0v/t9gO28Bznv6GgqARmVqcWOV3mfgUPn9becPHMiD3o9BwlRgaeccZnwTpZ7Zwqrm+c1sUSsMxIzQzc8Xg==", + "dev": true, + "funding": [ + { + "type": "opencollective", + "url": "https://opencollective.com/postcss/" + }, + { + "type": "tidelift", + "url": "https://tidelift.com/funding/github/npm/autoprefixer" + }, + { + "type": "github", + "url": "https://github.com/sponsors/ai" + } + ], + "license": "MIT", + "dependencies": { + "browserslist": "^4.27.0", + "caniuse-lite": "^1.0.30001754", + "fraction.js": "^5.3.4", + "normalize-range": "^0.1.2", + "picocolors": "^1.1.1", + "postcss-value-parser": "^4.2.0" + }, + "bin": { + "autoprefixer": "bin/autoprefixer" + }, + "engines": { + "node": "^10 || ^12 || >=14" + }, + "peerDependencies": { + "postcss": "^8.1.0" + } + }, + "node_modules/axios": { + "version": "1.13.2", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/axios/-/axios-1.13.2.tgz", + "integrity": "sha512-VPk9ebNqPcy5lRGuSlKx752IlDatOjT9paPlm8A7yOuW2Fbvp4X3JznJtT4f0GzGLLiWE9W8onz51SqLYwzGaA==", + "license": "MIT", + "dependencies": { + "follow-redirects": "^1.15.6", + "form-data": "^4.0.4", + "proxy-from-env": "^1.1.0" + } + }, + "node_modules/baseline-browser-mapping": { + "version": "2.8.29", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/baseline-browser-mapping/-/baseline-browser-mapping-2.8.29.tgz", + "integrity": "sha512-sXdt2elaVnhpDNRDz+1BDx1JQoJRuNk7oVlAlbGiFkLikHCAQiccexF/9e91zVi6RCgqspl04aP+6Cnl9zRLrA==", + "dev": true, + "license": "Apache-2.0", + "bin": { + "baseline-browser-mapping": "dist/cli.js" + } + }, + "node_modules/binary-extensions": { + "version": "2.3.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/binary-extensions/-/binary-extensions-2.3.0.tgz", + "integrity": "sha512-Ceh+7ox5qe7LJuLHoY0feh3pHuUDHAcRUeyL2VYghZwfpkNIy/+8Ocg0a3UuSoYzavmylwuLWQOf3hl0jjMMIw==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=8" + }, + "funding": { + "url": "https://github.com/sponsors/sindresorhus" + } + }, + "node_modules/braces": { + "version": "3.0.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/braces/-/braces-3.0.3.tgz", + "integrity": "sha512-yQbXgO/OSZVD2IsiLlro+7Hf6Q18EJrKSEsdoMzKePKXct3gvD8oLcOQdIzGupr5Fj+EDe8gO/lxc1BzfMpxvA==", + "dev": true, + "license": "MIT", + "dependencies": { + "fill-range": "^7.1.1" + }, + "engines": { + "node": ">=8" + } + }, + "node_modules/browserslist": { + "version": "4.28.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/browserslist/-/browserslist-4.28.0.tgz", + "integrity": "sha512-tbydkR/CxfMwelN0vwdP/pLkDwyAASZ+VfWm4EOwlB6SWhx1sYnWLqo8N5j0rAzPfzfRaxt0mM/4wPU/Su84RQ==", + "dev": true, + "funding": [ + { + "type": "opencollective", + "url": "https://opencollective.com/browserslist" + }, + { + "type": "tidelift", + "url": "https://tidelift.com/funding/github/npm/browserslist" + }, + { + "type": "github", + "url": "https://github.com/sponsors/ai" + } + ], + "license": "MIT", + "peer": true, + "dependencies": { + "baseline-browser-mapping": "^2.8.25", + "caniuse-lite": "^1.0.30001754", + "electron-to-chromium": "^1.5.249", + "node-releases": "^2.0.27", + "update-browserslist-db": "^1.1.4" + }, + "bin": { + "browserslist": "cli.js" + }, + "engines": { + "node": "^6 || ^7 || ^8 || ^9 || ^10 || ^11 || ^12 || >=13.7" + } + }, + "node_modules/call-bind-apply-helpers": { + "version": "1.0.2", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/call-bind-apply-helpers/-/call-bind-apply-helpers-1.0.2.tgz", + "integrity": "sha512-Sp1ablJ0ivDkSzjcaJdxEunN5/XvksFJ2sMBFfq6x0ryhQV/2b/KwFe21cMpmHtPOSij8K99/wSfoEuTObmuMQ==", + "license": "MIT", + "dependencies": { + "es-errors": "^1.3.0", + "function-bind": "^1.1.2" + }, + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/camelcase-css": { + "version": "2.0.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/camelcase-css/-/camelcase-css-2.0.1.tgz", + "integrity": "sha512-QOSvevhslijgYwRx6Rv7zKdMF8lbRmx+uQGx2+vDc+KI/eBnsy9kit5aj23AgGu3pa4t9AgwbnXWqS+iOY+2aA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">= 6" + } + }, + "node_modules/caniuse-lite": { + "version": "1.0.30001756", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/caniuse-lite/-/caniuse-lite-1.0.30001756.tgz", + "integrity": "sha512-4HnCNKbMLkLdhJz3TToeVWHSnfJvPaq6vu/eRP0Ahub/07n484XHhBF5AJoSGHdVrS8tKFauUQz8Bp9P7LVx7A==", + "dev": true, + "funding": [ + { + "type": "opencollective", + "url": "https://opencollective.com/browserslist" + }, + { + "type": "tidelift", + "url": "https://tidelift.com/funding/github/npm/caniuse-lite" + }, + { + "type": "github", + "url": "https://github.com/sponsors/ai" + } + ], + "license": "CC-BY-4.0" + }, + "node_modules/chokidar": { + "version": "3.6.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/chokidar/-/chokidar-3.6.0.tgz", + "integrity": "sha512-7VT13fmjotKpGipCW9JEQAusEPE+Ei8nl6/g4FBAmIm0GOOLMua9NDDo/DWp0ZAxCr3cPq5ZpBqmPAQgDda2Pw==", + "dev": true, + "license": "MIT", + "dependencies": { + "anymatch": "~3.1.2", + "braces": "~3.0.2", + "glob-parent": "~5.1.2", + "is-binary-path": "~2.1.0", + "is-glob": "~4.0.1", + "normalize-path": "~3.0.0", + "readdirp": "~3.6.0" + }, + "engines": { + "node": ">= 8.10.0" + }, + "funding": { + "url": "https://paulmillr.com/funding/" + }, + "optionalDependencies": { + "fsevents": "~2.3.2" + } + }, + "node_modules/chokidar/node_modules/glob-parent": { + "version": "5.1.2", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/glob-parent/-/glob-parent-5.1.2.tgz", + "integrity": "sha512-AOIgSQCepiJYwP3ARnGx+5VnTu2HBYdzbGP45eLw1vr3zB3vZLeyed1sC9hnbcOc9/SrMyM5RPQrkGz4aS9Zow==", + "dev": true, + "license": "ISC", + "dependencies": { + "is-glob": "^4.0.1" + }, + "engines": { + "node": ">= 6" + } + }, + "node_modules/combined-stream": { + "version": "1.0.8", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/combined-stream/-/combined-stream-1.0.8.tgz", + "integrity": "sha512-FQN4MRfuJeHf7cBbBMJFXhKSDq+2kAArBlmRBvcvFE5BB1HZKXtSFASDhdlz9zOYwxh8lDdnvmMOe/+5cdoEdg==", + "license": "MIT", + "dependencies": { + "delayed-stream": "~1.0.0" + }, + "engines": { + "node": ">= 0.8" + } + }, + "node_modules/commander": { + "version": "4.1.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/commander/-/commander-4.1.1.tgz", + "integrity": "sha512-NOKm8xhkzAjzFx8B2v5OAHT+u5pRQc2UCa2Vq9jYL/31o2wi9mxBA7LIFs3sV5VSC49z6pEhfbMULvShKj26WA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">= 6" + } + }, + "node_modules/convert-source-map": { + "version": "2.0.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/convert-source-map/-/convert-source-map-2.0.0.tgz", + "integrity": "sha512-Kvp459HrV2FEJ1CAsi1Ku+MY3kasH19TFykTz2xWmMeq6bk2NU3XXvfJ+Q61m0xktWwt+1HSYf3JZsTms3aRJg==", + "dev": true, + "license": "MIT" + }, + "node_modules/cssesc": { + "version": "3.0.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/cssesc/-/cssesc-3.0.0.tgz", + "integrity": "sha512-/Tb/JcjK111nNScGob5MNtsntNM1aCNUDipB/TkwZFhyDrrE47SOx/18wF2bbjgc3ZzCSKW1T5nt5EbFoAz/Vg==", + "dev": true, + "license": "MIT", + "bin": { + "cssesc": "bin/cssesc" + }, + "engines": { + "node": ">=4" + } + }, + "node_modules/csstype": { + "version": "3.2.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/csstype/-/csstype-3.2.3.tgz", + "integrity": "sha512-z1HGKcYy2xA8AGQfwrn0PAy+PB7X/GSj3UVJW9qKyn43xWa+gl5nXmU4qqLMRzWVLFC8KusUX8T/0kCiOYpAIQ==", + "dev": true, + "license": "MIT" + }, + "node_modules/debug": { + "version": "4.4.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/debug/-/debug-4.4.3.tgz", + "integrity": "sha512-RGwwWnwQvkVfavKVt22FGLw+xYSdzARwm0ru6DhTVA3umU5hZc28V3kO4stgYryrTlLpuvgI9GiijltAjNbcqA==", + "dev": true, + "license": "MIT", + "dependencies": { + "ms": "^2.1.3" + }, + "engines": { + "node": ">=6.0" + }, + "peerDependenciesMeta": { + "supports-color": { + "optional": true + } + } + }, + "node_modules/delayed-stream": { + "version": "1.0.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/delayed-stream/-/delayed-stream-1.0.0.tgz", + "integrity": "sha512-ZySD7Nf91aLB0RxL4KGrKHBXl7Eds1DAmEdcoVawXnLD7SDhpNgtuII2aAkg7a7QS41jxPSZ17p4VdGnMHk3MQ==", + "license": "MIT", + "engines": { + "node": ">=0.4.0" + } + }, + "node_modules/didyoumean": { + "version": "1.2.2", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/didyoumean/-/didyoumean-1.2.2.tgz", + "integrity": "sha512-gxtyfqMg7GKyhQmb056K7M3xszy/myH8w+B4RT+QXBQsvAOdc3XymqDDPHx1BgPgsdAA5SIifona89YtRATDzw==", + "dev": true, + "license": "Apache-2.0" + }, + "node_modules/dlv": { + "version": "1.1.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/dlv/-/dlv-1.1.3.tgz", + "integrity": "sha512-+HlytyjlPKnIG8XuRG8WvmBP8xs8P71y+SKKS6ZXWoEgLuePxtDoUEiH7WkdePWrQ5JBpE6aoVqfZfJUQkjXwA==", + "dev": true, + "license": "MIT" + }, + "node_modules/dunder-proto": { + "version": "1.0.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/dunder-proto/-/dunder-proto-1.0.1.tgz", + "integrity": "sha512-KIN/nDJBQRcXw0MLVhZE9iQHmG68qAVIBg9CqmUYjmQIhgij9U5MFvrqkUL5FbtyyzZuOeOt0zdeRe4UY7ct+A==", + "license": "MIT", + "dependencies": { + "call-bind-apply-helpers": "^1.0.1", + "es-errors": "^1.3.0", + "gopd": "^1.2.0" + }, + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/electron-to-chromium": { + "version": "1.5.258", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/electron-to-chromium/-/electron-to-chromium-1.5.258.tgz", + "integrity": "sha512-rHUggNV5jKQ0sSdWwlaRDkFc3/rRJIVnOSe9yR4zrR07m3ZxhP4N27Hlg8VeJGGYgFTxK5NqDmWI4DSH72vIJg==", + "dev": true, + "license": "ISC" + }, + "node_modules/es-define-property": { + "version": "1.0.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/es-define-property/-/es-define-property-1.0.1.tgz", + "integrity": "sha512-e3nRfgfUZ4rNGL232gUgX06QNyyez04KdjFrF+LTRoOXmrOgFKDg4BCdsjW8EnT69eqdYGmRpJwiPVYNrCaW3g==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/es-errors": { + "version": "1.3.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/es-errors/-/es-errors-1.3.0.tgz", + "integrity": "sha512-Zf5H2Kxt2xjTvbJvP2ZWLEICxA6j+hAmMzIlypy4xcBg1vKVnx89Wy0GbS+kf5cwCVFFzdCFh2XSCFNULS6csw==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/es-object-atoms": { + "version": "1.1.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/es-object-atoms/-/es-object-atoms-1.1.1.tgz", + "integrity": "sha512-FGgH2h8zKNim9ljj7dankFPcICIK9Cp5bm+c2gQSYePhpaG5+esrLODihIorn+Pe6FGJzWhXQotPv73jTaldXA==", + "license": "MIT", + "dependencies": { + "es-errors": "^1.3.0" + }, + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/es-set-tostringtag": { + "version": "2.1.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/es-set-tostringtag/-/es-set-tostringtag-2.1.0.tgz", + "integrity": "sha512-j6vWzfrGVfyXxge+O0x5sh6cvxAog0a/4Rdd2K36zCMV5eJ+/+tOAngRO8cODMNWbVRdVlmGZQL2YS3yR8bIUA==", + "license": "MIT", + "dependencies": { + "es-errors": "^1.3.0", + "get-intrinsic": "^1.2.6", + "has-tostringtag": "^1.0.2", + "hasown": "^2.0.2" + }, + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/esbuild": { + "version": "0.21.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/esbuild/-/esbuild-0.21.5.tgz", + "integrity": "sha512-mg3OPMV4hXywwpoDxu3Qda5xCKQi+vCTZq8S9J/EpkhB2HzKXq4SNFZE3+NK93JYxc8VMSep+lOUSC/RVKaBqw==", + "dev": true, + "hasInstallScript": true, + "license": "MIT", + "bin": { + "esbuild": "bin/esbuild" + }, + "engines": { + "node": ">=12" + }, + "optionalDependencies": { + "@esbuild/aix-ppc64": "0.21.5", + "@esbuild/android-arm": "0.21.5", + "@esbuild/android-arm64": "0.21.5", + "@esbuild/android-x64": "0.21.5", + "@esbuild/darwin-arm64": "0.21.5", + "@esbuild/darwin-x64": "0.21.5", + "@esbuild/freebsd-arm64": "0.21.5", + "@esbuild/freebsd-x64": "0.21.5", + "@esbuild/linux-arm": "0.21.5", + "@esbuild/linux-arm64": "0.21.5", + "@esbuild/linux-ia32": "0.21.5", + "@esbuild/linux-loong64": "0.21.5", + "@esbuild/linux-mips64el": "0.21.5", + "@esbuild/linux-ppc64": "0.21.5", + "@esbuild/linux-riscv64": "0.21.5", + "@esbuild/linux-s390x": "0.21.5", + "@esbuild/linux-x64": "0.21.5", + "@esbuild/netbsd-x64": "0.21.5", + "@esbuild/openbsd-x64": "0.21.5", + "@esbuild/sunos-x64": "0.21.5", + "@esbuild/win32-arm64": "0.21.5", + "@esbuild/win32-ia32": "0.21.5", + "@esbuild/win32-x64": "0.21.5" + } + }, + "node_modules/escalade": { + "version": "3.2.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/escalade/-/escalade-3.2.0.tgz", + "integrity": "sha512-WUj2qlxaQtO4g6Pq5c29GTcWGDyd8itL8zTlipgECz3JesAiiOKotd8JU6otB3PACgG6xkJUyVhboMS+bje/jA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6" + } + }, + "node_modules/fast-glob": { + "version": "3.3.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/fast-glob/-/fast-glob-3.3.3.tgz", + "integrity": "sha512-7MptL8U0cqcFdzIzwOTHoilX9x5BrNqye7Z/LuC7kCMRio1EMSyqRK3BEAUD7sXRq4iT4AzTVuZdhgQ2TCvYLg==", + "dev": true, + "license": "MIT", + "dependencies": { + "@nodelib/fs.stat": "^2.0.2", + "@nodelib/fs.walk": "^1.2.3", + "glob-parent": "^5.1.2", + "merge2": "^1.3.0", + "micromatch": "^4.0.8" + }, + "engines": { + "node": ">=8.6.0" + } + }, + "node_modules/fast-glob/node_modules/glob-parent": { + "version": "5.1.2", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/glob-parent/-/glob-parent-5.1.2.tgz", + "integrity": "sha512-AOIgSQCepiJYwP3ARnGx+5VnTu2HBYdzbGP45eLw1vr3zB3vZLeyed1sC9hnbcOc9/SrMyM5RPQrkGz4aS9Zow==", + "dev": true, + "license": "ISC", + "dependencies": { + "is-glob": "^4.0.1" + }, + "engines": { + "node": ">= 6" + } + }, + "node_modules/fastq": { + "version": "1.19.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/fastq/-/fastq-1.19.1.tgz", + "integrity": "sha512-GwLTyxkCXjXbxqIhTsMI2Nui8huMPtnxg7krajPJAjnEG/iiOS7i+zCtWGZR9G0NBKbXKh6X9m9UIsYX/N6vvQ==", + "dev": true, + "license": "ISC", + "dependencies": { + "reusify": "^1.0.4" + } + }, + "node_modules/fill-range": { + "version": "7.1.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/fill-range/-/fill-range-7.1.1.tgz", + "integrity": "sha512-YsGpe3WHLK8ZYi4tWDg2Jy3ebRz2rXowDxnld4bkQB00cc/1Zw9AWnC0i9ztDJitivtQvaI9KaLyKrc+hBW0yg==", + "dev": true, + "license": "MIT", + "dependencies": { + "to-regex-range": "^5.0.1" + }, + "engines": { + "node": ">=8" + } + }, + "node_modules/follow-redirects": { + "version": "1.15.11", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/follow-redirects/-/follow-redirects-1.15.11.tgz", + "integrity": "sha512-deG2P0JfjrTxl50XGCDyfI97ZGVCxIpfKYmfyrQ54n5FO/0gfIES8C/Psl6kWVDolizcaaxZJnTS0QSMxvnsBQ==", + "funding": [ + { + "type": "individual", + "url": "https://github.com/sponsors/RubenVerborgh" + } + ], + "license": "MIT", + "engines": { + "node": ">=4.0" + }, + "peerDependenciesMeta": { + "debug": { + "optional": true + } + } + }, + "node_modules/form-data": { + "version": "4.0.5", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/form-data/-/form-data-4.0.5.tgz", + "integrity": "sha512-8RipRLol37bNs2bhoV67fiTEvdTrbMUYcFTiy3+wuuOnUog2QBHCZWXDRijWQfAkhBj2Uf5UnVaiWwA5vdd82w==", + "license": "MIT", + "dependencies": { + "asynckit": "^0.4.0", + "combined-stream": "^1.0.8", + "es-set-tostringtag": "^2.1.0", + "hasown": "^2.0.2", + "mime-types": "^2.1.12" + }, + "engines": { + "node": ">= 6" + } + }, + "node_modules/fraction.js": { + "version": "5.3.4", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/fraction.js/-/fraction.js-5.3.4.tgz", + "integrity": "sha512-1X1NTtiJphryn/uLQz3whtY6jK3fTqoE3ohKs0tT+Ujr1W59oopxmoEh7Lu5p6vBaPbgoM0bzveAW4Qi5RyWDQ==", + "dev": true, + "license": "MIT", + "engines": { + "node": "*" + }, + "funding": { + "type": "github", + "url": "https://github.com/sponsors/rawify" + } + }, + "node_modules/fsevents": { + "version": "2.3.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/fsevents/-/fsevents-2.3.3.tgz", + "integrity": "sha512-5xoDfX+fL7faATnagmWPpbFtwh/R77WmMMqqHGS65C3vvB0YHrgF+B1YmZ3441tMj5n63k0212XNoJwzlhffQw==", + "dev": true, + "hasInstallScript": true, + "license": "MIT", + "optional": true, + "os": [ + "darwin" + ], + "engines": { + "node": "^8.16.0 || ^10.6.0 || >=11.0.0" + } + }, + "node_modules/function-bind": { + "version": "1.1.2", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/function-bind/-/function-bind-1.1.2.tgz", + "integrity": "sha512-7XHNxH7qX9xG5mIwxkhumTox/MIRNcOgDrxWsMt2pAr23WHp6MrRlN7FBSFpCpr+oVO0F744iUgR82nJMfG2SA==", + "license": "MIT", + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/gensync": { + "version": "1.0.0-beta.2", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/gensync/-/gensync-1.0.0-beta.2.tgz", + "integrity": "sha512-3hN7NaskYvMDLQY55gnW3NQ+mesEAepTqlg+VEbj7zzqEMBVNhzcGYYeqFo/TlYz6eQiFcp1HcsCZO+nGgS8zg==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=6.9.0" + } + }, + "node_modules/get-intrinsic": { + "version": "1.3.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/get-intrinsic/-/get-intrinsic-1.3.0.tgz", + "integrity": "sha512-9fSjSaos/fRIVIp+xSJlE6lfwhES7LNtKaCBIamHsjr2na1BiABJPo0mOjjz8GJDURarmCPGqaiVg5mfjb98CQ==", + "license": "MIT", + "dependencies": { + "call-bind-apply-helpers": "^1.0.2", + "es-define-property": "^1.0.1", + "es-errors": "^1.3.0", + "es-object-atoms": "^1.1.1", + "function-bind": "^1.1.2", + "get-proto": "^1.0.1", + "gopd": "^1.2.0", + "has-symbols": "^1.1.0", + "hasown": "^2.0.2", + "math-intrinsics": "^1.1.0" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/get-proto": { + "version": "1.0.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/get-proto/-/get-proto-1.0.1.tgz", + "integrity": "sha512-sTSfBjoXBp89JvIKIefqw7U2CCebsc74kiY6awiGogKtoSGbgjYE/G/+l9sF3MWFPNc9IcoOC4ODfKHfxFmp0g==", + "license": "MIT", + "dependencies": { + "dunder-proto": "^1.0.1", + "es-object-atoms": "^1.0.0" + }, + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/glob-parent": { + "version": "6.0.2", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/glob-parent/-/glob-parent-6.0.2.tgz", + "integrity": "sha512-XxwI8EOhVQgWp6iDL+3b0r86f4d6AX6zSU55HfB4ydCEuXLXc5FcYeOu+nnGftS4TEju/11rt4KJPTMgbfmv4A==", + "dev": true, + "license": "ISC", + "dependencies": { + "is-glob": "^4.0.3" + }, + "engines": { + "node": ">=10.13.0" + } + }, + "node_modules/gopd": { + "version": "1.2.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/gopd/-/gopd-1.2.0.tgz", + "integrity": "sha512-ZUKRh6/kUFoAiTAtTYPZJ3hw9wNxx+BIBOijnlG9PnrJsCcSjs1wyyD6vJpaYtgnzDrKYRSqf3OO6Rfa93xsRg==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/has-symbols": { + "version": "1.1.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/has-symbols/-/has-symbols-1.1.0.tgz", + "integrity": "sha512-1cDNdwJ2Jaohmb3sg4OmKaMBwuC48sYni5HUw2DvsC8LjGTLK9h+eb1X6RyuOHe4hT0ULCW68iomhjUoKUqlPQ==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/has-tostringtag": { + "version": "1.0.2", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/has-tostringtag/-/has-tostringtag-1.0.2.tgz", + "integrity": "sha512-NqADB8VjPFLM2V0VvHUewwwsw0ZWBaIdgo+ieHtK3hasLz4qeCRjYcqfB6AQrBggRKppKF8L52/VqdVsO47Dlw==", + "license": "MIT", + "dependencies": { + "has-symbols": "^1.0.3" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/hasown": { + "version": "2.0.2", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/hasown/-/hasown-2.0.2.tgz", + "integrity": "sha512-0hJU9SCPvmMzIBdZFqNPXWa6dqh7WdH0cII9y+CyS8rG3nL48Bclra9HmKhVVUHyPWNH5Y7xDwAB7bfgSjkUMQ==", + "license": "MIT", + "dependencies": { + "function-bind": "^1.1.2" + }, + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/is-binary-path": { + "version": "2.1.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/is-binary-path/-/is-binary-path-2.1.0.tgz", + "integrity": "sha512-ZMERYes6pDydyuGidse7OsHxtbI7WVeUEozgR/g7rd0xUimYNlvZRE/K2MgZTjWy725IfelLeVcEM97mmtRGXw==", + "dev": true, + "license": "MIT", + "dependencies": { + "binary-extensions": "^2.0.0" + }, + "engines": { + "node": ">=8" + } + }, + "node_modules/is-core-module": { + "version": "2.16.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/is-core-module/-/is-core-module-2.16.1.tgz", + "integrity": "sha512-UfoeMA6fIJ8wTYFEUjelnaGI67v6+N7qXJEvQuIGa99l4xsCruSYOVSQ0uPANn4dAzm8lkYPaKLrrijLq7x23w==", + "dev": true, + "license": "MIT", + "dependencies": { + "hasown": "^2.0.2" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/is-extglob": { + "version": "2.1.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/is-extglob/-/is-extglob-2.1.1.tgz", + "integrity": "sha512-SbKbANkN603Vi4jEZv49LeVJMn4yGwsbzZworEoyEiutsN3nJYdbO36zfhGJ6QEDpOZIFkDtnq5JRxmvl3jsoQ==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/is-glob": { + "version": "4.0.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/is-glob/-/is-glob-4.0.3.tgz", + "integrity": "sha512-xelSayHH36ZgE7ZWhli7pW34hNbNl8Ojv5KVmkJD4hBdD3th8Tfk9vYasLM+mXWOZhFkgZfxhLSnrwRr4elSSg==", + "dev": true, + "license": "MIT", + "dependencies": { + "is-extglob": "^2.1.1" + }, + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/is-number": { + "version": "7.0.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/is-number/-/is-number-7.0.0.tgz", + "integrity": "sha512-41Cifkg6e8TylSpdtTpeLVMqvSBEVzTttHvERD741+pnZ8ANv0004MRL43QKPDlK9cGvNp6NZWZUBlbGXYxxng==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=0.12.0" + } + }, + "node_modules/jiti": { + "version": "1.21.7", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/jiti/-/jiti-1.21.7.tgz", + "integrity": "sha512-/imKNG4EbWNrVjoNC/1H5/9GFy+tqjGBHCaSsN+P2RnPqjsLmv6UD3Ej+Kj8nBWaRAwyk7kK5ZUc+OEatnTR3A==", + "dev": true, + "license": "MIT", + "peer": true, + "bin": { + "jiti": "bin/jiti.js" + } + }, + "node_modules/js-tokens": { + "version": "4.0.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/js-tokens/-/js-tokens-4.0.0.tgz", + "integrity": "sha512-RdJUflcE3cUzKiMqQgsCu06FPu9UdIJO0beYbPhHN4k6apgJtifcoCtT9bcxOpYBtpD2kCM6Sbzg4CausW/PKQ==", + "license": "MIT" + }, + "node_modules/jsesc": { + "version": "3.1.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/jsesc/-/jsesc-3.1.0.tgz", + "integrity": "sha512-/sM3dO2FOzXjKQhJuo0Q173wf2KOo8t4I8vHy6lF9poUp7bKT0/NHE8fPX23PwfhnykfqnC2xRxOnVw5XuGIaA==", + "dev": true, + "license": "MIT", + "bin": { + "jsesc": "bin/jsesc" + }, + "engines": { + "node": ">=6" + } + }, + "node_modules/json5": { + "version": "2.2.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/json5/-/json5-2.2.3.tgz", + "integrity": "sha512-XmOWe7eyHYH14cLdVPoyg+GOH3rYX++KpzrylJwSW98t3Nk+U8XOl8FWKOgwtzdb8lXGf6zYwDUzeHMWfxasyg==", + "dev": true, + "license": "MIT", + "bin": { + "json5": "lib/cli.js" + }, + "engines": { + "node": ">=6" + } + }, + "node_modules/lilconfig": { + "version": "3.1.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/lilconfig/-/lilconfig-3.1.3.tgz", + "integrity": "sha512-/vlFKAoH5Cgt3Ie+JLhRbwOsCQePABiU3tJ1egGvyQ+33R/vcwM2Zl2QR/LzjsBeItPt3oSVXapn+m4nQDvpzw==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=14" + }, + "funding": { + "url": "https://github.com/sponsors/antonk52" + } + }, + "node_modules/lines-and-columns": { + "version": "1.2.4", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/lines-and-columns/-/lines-and-columns-1.2.4.tgz", + "integrity": "sha512-7ylylesZQ/PV29jhEDl3Ufjo6ZX7gCqJr5F7PKrqc93v7fzSymt1BpwEU8nAUXs8qzzvqhbjhK5QZg6Mt/HkBg==", + "dev": true, + "license": "MIT" + }, + "node_modules/loose-envify": { + "version": "1.4.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/loose-envify/-/loose-envify-1.4.0.tgz", + "integrity": "sha512-lyuxPGr/Wfhrlem2CL/UcnUc1zcqKAImBDzukY7Y5F/yQiNdko6+fRLevlw1HgMySw7f611UIY408EtxRSoK3Q==", + "license": "MIT", + "dependencies": { + "js-tokens": "^3.0.0 || ^4.0.0" + }, + "bin": { + "loose-envify": "cli.js" + } + }, + "node_modules/lru-cache": { + "version": "5.1.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/lru-cache/-/lru-cache-5.1.1.tgz", + "integrity": "sha512-KpNARQA3Iwv+jTA0utUVVbrh+Jlrr1Fv0e56GGzAFOXN7dk/FviaDW8LHmK52DlcH4WP2n6gI8vN1aesBFgo9w==", + "dev": true, + "license": "ISC", + "dependencies": { + "yallist": "^3.0.2" + } + }, + "node_modules/lucide-react": { + "version": "0.292.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/lucide-react/-/lucide-react-0.292.0.tgz", + "integrity": "sha512-rRgUkpEHWpa5VCT66YscInCQmQuPCB1RFRzkkxMxg4b+jaL0V12E3riWWR2Sh5OIiUhCwGW/ZExuEO4Az32E6Q==", + "license": "ISC", + "peerDependencies": { + "react": "^16.5.1 || ^17.0.0 || ^18.0.0" + } + }, + "node_modules/math-intrinsics": { + "version": "1.1.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/math-intrinsics/-/math-intrinsics-1.1.0.tgz", + "integrity": "sha512-/IXtbwEk5HTPyEwyKX6hGkYXxM9nbj64B+ilVJnC/R6B0pH5G4V3b0pVbL7DBj4tkhBAppbQUlf6F6Xl9LHu1g==", + "license": "MIT", + "engines": { + "node": ">= 0.4" + } + }, + "node_modules/merge2": { + "version": "1.4.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/merge2/-/merge2-1.4.1.tgz", + "integrity": "sha512-8q7VEgMJW4J8tcfVPy8g09NcQwZdbwFEqhe/WZkoIzjn/3TGDwtOCYtXGxA3O8tPzpczCCDgv+P2P5y00ZJOOg==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">= 8" + } + }, + "node_modules/micromatch": { + "version": "4.0.8", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/micromatch/-/micromatch-4.0.8.tgz", + "integrity": "sha512-PXwfBhYu0hBCPw8Dn0E+WDYb7af3dSLVWKi3HGv84IdF4TyFoC0ysxFd0Goxw7nSv4T/PzEJQxsYsEiFCKo2BA==", + "dev": true, + "license": "MIT", + "dependencies": { + "braces": "^3.0.3", + "picomatch": "^2.3.1" + }, + "engines": { + "node": ">=8.6" + } + }, + "node_modules/mime-db": { + "version": "1.52.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/mime-db/-/mime-db-1.52.0.tgz", + "integrity": "sha512-sPU4uV7dYlvtWJxwwxHD0PuihVNiE7TyAbQ5SWxDCB9mUYvOgroQOwYQQOKPJ8CIbE+1ETVlOoK1UC2nU3gYvg==", + "license": "MIT", + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/mime-types": { + "version": "2.1.35", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/mime-types/-/mime-types-2.1.35.tgz", + "integrity": "sha512-ZDY+bPm5zTTF+YpCrAU9nK0UgICYPT0QtT1NZWFv4s++TNkcgVaT0g6+4R2uI4MjQjzysHB1zxuWL50hzaeXiw==", + "license": "MIT", + "dependencies": { + "mime-db": "1.52.0" + }, + "engines": { + "node": ">= 0.6" + } + }, + "node_modules/ms": { + "version": "2.1.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/ms/-/ms-2.1.3.tgz", + "integrity": "sha512-6FlzubTLZG3J2a/NVCAleEhjzq5oxgHyaCU9yYXvcLsvoVaHJq/s5xXI6/XXP6tz7R9xAOtHnSO/tXtF3WRTlA==", + "dev": true, + "license": "MIT" + }, + "node_modules/mz": { + "version": "2.7.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/mz/-/mz-2.7.0.tgz", + "integrity": "sha512-z81GNO7nnYMEhrGh9LeymoE4+Yr0Wn5McHIZMK5cfQCl+NDX08sCZgUc9/6MHni9IWuFLm1Z3HTCXu2z9fN62Q==", + "dev": true, + "license": "MIT", + "dependencies": { + "any-promise": "^1.0.0", + "object-assign": "^4.0.1", + "thenify-all": "^1.0.0" + } + }, + "node_modules/nanoid": { + "version": "3.3.11", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/nanoid/-/nanoid-3.3.11.tgz", + "integrity": "sha512-N8SpfPUnUp1bK+PMYW8qSWdl9U+wwNWI4QKxOYDy9JAro3WMX7p2OeVRF9v+347pnakNevPmiHhNmZ2HbFA76w==", + "dev": true, + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/ai" + } + ], + "license": "MIT", + "bin": { + "nanoid": "bin/nanoid.cjs" + }, + "engines": { + "node": "^10 || ^12 || ^13.7 || ^14 || >=15.0.1" + } + }, + "node_modules/node-releases": { + "version": "2.0.27", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/node-releases/-/node-releases-2.0.27.tgz", + "integrity": "sha512-nmh3lCkYZ3grZvqcCH+fjmQ7X+H0OeZgP40OierEaAptX4XofMh5kwNbWh7lBduUzCcV/8kZ+NDLCwm2iorIlA==", + "dev": true, + "license": "MIT" + }, + "node_modules/normalize-path": { + "version": "3.0.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/normalize-path/-/normalize-path-3.0.0.tgz", + "integrity": "sha512-6eZs5Ls3WtCisHWp9S2GUy8dqkpGi4BVSz3GaqiE6ezub0512ESztXUwUB6C6IKbQkY2Pnb/mD4WYojCRwcwLA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/normalize-range": { + "version": "0.1.2", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/normalize-range/-/normalize-range-0.1.2.tgz", + "integrity": "sha512-bdok/XvKII3nUpklnV6P2hxtMNrCboOjAcyBuQnWEhO665FwrSNRxU+AqpsyvO6LgGYPspN+lu5CLtw4jPRKNA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/object-assign": { + "version": "4.1.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/object-assign/-/object-assign-4.1.1.tgz", + "integrity": "sha512-rJgTQnkUnH1sFw8yT6VSU3zD3sWmu6sZhIseY8VX+GRu3P6F7Fu+JNDoXfklElbLJSnc3FUQHVe4cU5hj+BcUg==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/object-hash": { + "version": "3.0.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/object-hash/-/object-hash-3.0.0.tgz", + "integrity": "sha512-RSn9F68PjH9HqtltsSnqYC1XXoWe9Bju5+213R98cNGttag9q9yAOTzdbsqvIa7aNm5WffBZFpWYr2aWrklWAw==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">= 6" + } + }, + "node_modules/path-parse": { + "version": "1.0.7", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/path-parse/-/path-parse-1.0.7.tgz", + "integrity": "sha512-LDJzPVEEEPR+y48z93A0Ed0yXb8pAByGWo/k5YYdYgpY2/2EsOsksJrq7lOHxryrVOn1ejG6oAp8ahvOIQD8sw==", + "dev": true, + "license": "MIT" + }, + "node_modules/picocolors": { + "version": "1.1.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/picocolors/-/picocolors-1.1.1.tgz", + "integrity": "sha512-xceH2snhtb5M9liqDsmEw56le376mTZkEX/jEb/RxNFyegNul7eNslCXP9FDj/Lcu0X8KEyMceP2ntpaHrDEVA==", + "dev": true, + "license": "ISC" + }, + "node_modules/picomatch": { + "version": "2.3.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/picomatch/-/picomatch-2.3.1.tgz", + "integrity": "sha512-JU3teHTNjmE2VCGFzuY8EXzCDVwEqB2a8fsIvwaStHhAWJEeVd1o1QD80CU6+ZdEXXSLbSsuLwJjkCBWqRQUVA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=8.6" + }, + "funding": { + "url": "https://github.com/sponsors/jonschlinkert" + } + }, + "node_modules/pify": { + "version": "2.3.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/pify/-/pify-2.3.0.tgz", + "integrity": "sha512-udgsAY+fTnvv7kI7aaxbqwWNb0AHiB0qBO89PZKPkoTmGOgdbrHDKD+0B2X4uTfJ/FT1R09r9gTsjUjNJotuog==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/pirates": { + "version": "4.0.7", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/pirates/-/pirates-4.0.7.tgz", + "integrity": "sha512-TfySrs/5nm8fQJDcBDuUng3VOUKsd7S+zqvbOTiGXHfxX4wK31ard+hoNuvkicM/2YFzlpDgABOevKSsB4G/FA==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">= 6" + } + }, + "node_modules/postcss": { + "version": "8.5.6", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/postcss/-/postcss-8.5.6.tgz", + "integrity": "sha512-3Ybi1tAuwAP9s0r1UQ2J4n5Y0G05bJkpUIO0/bI9MhwmD70S5aTWbXGBwxHrelT+XM1k6dM0pk+SwNkpTRN7Pg==", + "dev": true, + "funding": [ + { + "type": "opencollective", + "url": "https://opencollective.com/postcss/" + }, + { + "type": "tidelift", + "url": "https://tidelift.com/funding/github/npm/postcss" + }, + { + "type": "github", + "url": "https://github.com/sponsors/ai" + } + ], + "license": "MIT", + "peer": true, + "dependencies": { + "nanoid": "^3.3.11", + "picocolors": "^1.1.1", + "source-map-js": "^1.2.1" + }, + "engines": { + "node": "^10 || ^12 || >=14" + } + }, + "node_modules/postcss-import": { + "version": "15.1.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/postcss-import/-/postcss-import-15.1.0.tgz", + "integrity": "sha512-hpr+J05B2FVYUAXHeK1YyI267J/dDDhMU6B6civm8hSY1jYJnBXxzKDKDswzJmtLHryrjhnDjqqp/49t8FALew==", + "dev": true, + "license": "MIT", + "dependencies": { + "postcss-value-parser": "^4.0.0", + "read-cache": "^1.0.0", + "resolve": "^1.1.7" + }, + "engines": { + "node": ">=14.0.0" + }, + "peerDependencies": { + "postcss": "^8.0.0" + } + }, + "node_modules/postcss-js": { + "version": "4.1.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/postcss-js/-/postcss-js-4.1.0.tgz", + "integrity": "sha512-oIAOTqgIo7q2EOwbhb8UalYePMvYoIeRY2YKntdpFQXNosSu3vLrniGgmH9OKs/qAkfoj5oB3le/7mINW1LCfw==", + "dev": true, + "funding": [ + { + "type": "opencollective", + "url": "https://opencollective.com/postcss/" + }, + { + "type": "github", + "url": "https://github.com/sponsors/ai" + } + ], + "license": "MIT", + "dependencies": { + "camelcase-css": "^2.0.1" + }, + "engines": { + "node": "^12 || ^14 || >= 16" + }, + "peerDependencies": { + "postcss": "^8.4.21" + } + }, + "node_modules/postcss-load-config": { + "version": "6.0.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/postcss-load-config/-/postcss-load-config-6.0.1.tgz", + "integrity": "sha512-oPtTM4oerL+UXmx+93ytZVN82RrlY/wPUV8IeDxFrzIjXOLF1pN+EmKPLbubvKHT2HC20xXsCAH2Z+CKV6Oz/g==", + "dev": true, + "funding": [ + { + "type": "opencollective", + "url": "https://opencollective.com/postcss/" + }, + { + "type": "github", + "url": "https://github.com/sponsors/ai" + } + ], + "license": "MIT", + "dependencies": { + "lilconfig": "^3.1.1" + }, + "engines": { + "node": ">= 18" + }, + "peerDependencies": { + "jiti": ">=1.21.0", + "postcss": ">=8.0.9", + "tsx": "^4.8.1", + "yaml": "^2.4.2" + }, + "peerDependenciesMeta": { + "jiti": { + "optional": true + }, + "postcss": { + "optional": true + }, + "tsx": { + "optional": true + }, + "yaml": { + "optional": true + } + } + }, + "node_modules/postcss-nested": { + "version": "6.2.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/postcss-nested/-/postcss-nested-6.2.0.tgz", + "integrity": "sha512-HQbt28KulC5AJzG+cZtj9kvKB93CFCdLvog1WFLf1D+xmMvPGlBstkpTEZfK5+AN9hfJocyBFCNiqyS48bpgzQ==", + "dev": true, + "funding": [ + { + "type": "opencollective", + "url": "https://opencollective.com/postcss/" + }, + { + "type": "github", + "url": "https://github.com/sponsors/ai" + } + ], + "license": "MIT", + "dependencies": { + "postcss-selector-parser": "^6.1.1" + }, + "engines": { + "node": ">=12.0" + }, + "peerDependencies": { + "postcss": "^8.2.14" + } + }, + "node_modules/postcss-selector-parser": { + "version": "6.1.2", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/postcss-selector-parser/-/postcss-selector-parser-6.1.2.tgz", + "integrity": "sha512-Q8qQfPiZ+THO/3ZrOrO0cJJKfpYCagtMUkXbnEfmgUjwXg6z/WBeOyS9APBBPCTSiDV+s4SwQGu8yFsiMRIudg==", + "dev": true, + "license": "MIT", + "dependencies": { + "cssesc": "^3.0.0", + "util-deprecate": "^1.0.2" + }, + "engines": { + "node": ">=4" + } + }, + "node_modules/postcss-value-parser": { + "version": "4.2.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/postcss-value-parser/-/postcss-value-parser-4.2.0.tgz", + "integrity": "sha512-1NNCs6uurfkVbeXG4S8JFT9t19m45ICnif8zWLd5oPSZ50QnwMfK+H3jv408d4jw/7Bttv5axS5IiHoLaVNHeQ==", + "dev": true, + "license": "MIT" + }, + "node_modules/proxy-from-env": { + "version": "1.1.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/proxy-from-env/-/proxy-from-env-1.1.0.tgz", + "integrity": "sha512-D+zkORCbA9f1tdWRK0RaCR3GPv50cMxcrz4X8k5LTSUD1Dkw47mKJEZQNunItRTkWwgtaUSo1RVFRIG9ZXiFYg==", + "license": "MIT" + }, + "node_modules/queue-microtask": { + "version": "1.2.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/queue-microtask/-/queue-microtask-1.2.3.tgz", + "integrity": "sha512-NuaNSa6flKT5JaSYQzJok04JzTL1CA6aGhv5rfLW3PgqA+M2ChpZQnAC8h8i4ZFkBS8X5RqkDBHA7r4hej3K9A==", + "dev": true, + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/feross" + }, + { + "type": "patreon", + "url": "https://www.patreon.com/feross" + }, + { + "type": "consulting", + "url": "https://feross.org/support" + } + ], + "license": "MIT" + }, + "node_modules/react": { + "version": "18.3.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/react/-/react-18.3.1.tgz", + "integrity": "sha512-wS+hAgJShR0KhEvPJArfuPVN1+Hz1t0Y6n5jLrGQbkb4urgPE/0Rve+1kMB1v/oWgHgm4WIcV+i7F2pTVj+2iQ==", + "license": "MIT", + "peer": true, + "dependencies": { + "loose-envify": "^1.1.0" + }, + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/react-dom": { + "version": "18.3.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/react-dom/-/react-dom-18.3.1.tgz", + "integrity": "sha512-5m4nQKp+rZRb09LNH59GM4BxTh9251/ylbKIbpe7TpGxfJ+9kv6BLkLBXIjjspbgbnIBNqlI23tRnTWT0snUIw==", + "license": "MIT", + "dependencies": { + "loose-envify": "^1.1.0", + "scheduler": "^0.23.2" + }, + "peerDependencies": { + "react": "^18.3.1" + } + }, + "node_modules/react-refresh": { + "version": "0.17.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/react-refresh/-/react-refresh-0.17.0.tgz", + "integrity": "sha512-z6F7K9bV85EfseRCp2bzrpyQ0Gkw1uLoCel9XBVWPg/TjRj94SkJzUTGfOa4bs7iJvBWtQG0Wq7wnI0syw3EBQ==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/read-cache": { + "version": "1.0.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/read-cache/-/read-cache-1.0.0.tgz", + "integrity": "sha512-Owdv/Ft7IjOgm/i0xvNDZ1LrRANRfew4b2prF3OWMQLxLfu3bS8FVhCsrSCMK4lR56Y9ya+AThoTpDCTxCmpRA==", + "dev": true, + "license": "MIT", + "dependencies": { + "pify": "^2.3.0" + } + }, + "node_modules/readdirp": { + "version": "3.6.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/readdirp/-/readdirp-3.6.0.tgz", + "integrity": "sha512-hOS089on8RduqdbhvQ5Z37A0ESjsqz6qnRcffsMU3495FuTdqSm+7bhJ29JvIOsBDEEnan5DPu9t3To9VRlMzA==", + "dev": true, + "license": "MIT", + "dependencies": { + "picomatch": "^2.2.1" + }, + "engines": { + "node": ">=8.10.0" + } + }, + "node_modules/resolve": { + "version": "1.22.11", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/resolve/-/resolve-1.22.11.tgz", + "integrity": "sha512-RfqAvLnMl313r7c9oclB1HhUEAezcpLjz95wFH4LVuhk9JF/r22qmVP9AMmOU4vMX7Q8pN8jwNg/CSpdFnMjTQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "is-core-module": "^2.16.1", + "path-parse": "^1.0.7", + "supports-preserve-symlinks-flag": "^1.0.0" + }, + "bin": { + "resolve": "bin/resolve" + }, + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/reusify": { + "version": "1.1.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/reusify/-/reusify-1.1.0.tgz", + "integrity": "sha512-g6QUff04oZpHs0eG5p83rFLhHeV00ug/Yf9nZM6fLeUrPguBTkTQOdpAWWspMh55TZfVQDPaN3NQJfbVRAxdIw==", + "dev": true, + "license": "MIT", + "engines": { + "iojs": ">=1.0.0", + "node": ">=0.10.0" + } + }, + "node_modules/rollup": { + "version": "4.53.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/rollup/-/rollup-4.53.3.tgz", + "integrity": "sha512-w8GmOxZfBmKknvdXU1sdM9NHcoQejwF/4mNgj2JuEEdRaHwwF12K7e9eXn1nLZ07ad+du76mkVsyeb2rKGllsA==", + "dev": true, + "license": "MIT", + "dependencies": { + "@types/estree": "1.0.8" + }, + "bin": { + "rollup": "dist/bin/rollup" + }, + "engines": { + "node": ">=18.0.0", + "npm": ">=8.0.0" + }, + "optionalDependencies": { + "@rollup/rollup-android-arm-eabi": "4.53.3", + "@rollup/rollup-android-arm64": "4.53.3", + "@rollup/rollup-darwin-arm64": "4.53.3", + "@rollup/rollup-darwin-x64": "4.53.3", + "@rollup/rollup-freebsd-arm64": "4.53.3", + "@rollup/rollup-freebsd-x64": "4.53.3", + "@rollup/rollup-linux-arm-gnueabihf": "4.53.3", + "@rollup/rollup-linux-arm-musleabihf": "4.53.3", + "@rollup/rollup-linux-arm64-gnu": "4.53.3", + "@rollup/rollup-linux-arm64-musl": "4.53.3", + "@rollup/rollup-linux-loong64-gnu": "4.53.3", + "@rollup/rollup-linux-ppc64-gnu": "4.53.3", + "@rollup/rollup-linux-riscv64-gnu": "4.53.3", + "@rollup/rollup-linux-riscv64-musl": "4.53.3", + "@rollup/rollup-linux-s390x-gnu": "4.53.3", + "@rollup/rollup-linux-x64-gnu": "4.53.3", + "@rollup/rollup-linux-x64-musl": "4.53.3", + "@rollup/rollup-openharmony-arm64": "4.53.3", + "@rollup/rollup-win32-arm64-msvc": "4.53.3", + "@rollup/rollup-win32-ia32-msvc": "4.53.3", + "@rollup/rollup-win32-x64-gnu": "4.53.3", + "@rollup/rollup-win32-x64-msvc": "4.53.3", + "fsevents": "~2.3.2" + } + }, + "node_modules/run-parallel": { + "version": "1.2.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/run-parallel/-/run-parallel-1.2.0.tgz", + "integrity": "sha512-5l4VyZR86LZ/lDxZTR6jqL8AFE2S0IFLMP26AbjsLVADxHdhB/c0GUsH+y39UfCi3dzz8OlQuPmnaJOMoDHQBA==", + "dev": true, + "funding": [ + { + "type": "github", + "url": "https://github.com/sponsors/feross" + }, + { + "type": "patreon", + "url": "https://www.patreon.com/feross" + }, + { + "type": "consulting", + "url": "https://feross.org/support" + } + ], + "license": "MIT", + "dependencies": { + "queue-microtask": "^1.2.2" + } + }, + "node_modules/scheduler": { + "version": "0.23.2", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/scheduler/-/scheduler-0.23.2.tgz", + "integrity": "sha512-UOShsPwz7NrMUqhR6t0hWjFduvOzbtv7toDH1/hIrfRNIDBnnBWd0CwJTGvTpngVlmwGCdP9/Zl/tVrDqcuYzQ==", + "license": "MIT", + "dependencies": { + "loose-envify": "^1.1.0" + } + }, + "node_modules/semver": { + "version": "6.3.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/semver/-/semver-6.3.1.tgz", + "integrity": "sha512-BR7VvDCVHO+q2xBEWskxS6DJE1qRnb7DxzUrogb71CWoSficBxYsiAGd+Kl0mmq/MprG9yArRkyrQxTO6XjMzA==", + "dev": true, + "license": "ISC", + "bin": { + "semver": "bin/semver.js" + } + }, + "node_modules/source-map-js": { + "version": "1.2.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/source-map-js/-/source-map-js-1.2.1.tgz", + "integrity": "sha512-UXWMKhLOwVKb728IUtQPXxfYU+usdybtUrK/8uGE8CQMvrhOpwvzDBwj0QhSL7MQc7vIsISBG8VQ8+IDQxpfQA==", + "dev": true, + "license": "BSD-3-Clause", + "engines": { + "node": ">=0.10.0" + } + }, + "node_modules/sucrase": { + "version": "3.35.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/sucrase/-/sucrase-3.35.1.tgz", + "integrity": "sha512-DhuTmvZWux4H1UOnWMB3sk0sbaCVOoQZjv8u1rDoTV0HTdGem9hkAZtl4JZy8P2z4Bg0nT+YMeOFyVr4zcG5Tw==", + "dev": true, + "license": "MIT", + "dependencies": { + "@jridgewell/gen-mapping": "^0.3.2", + "commander": "^4.0.0", + "lines-and-columns": "^1.1.6", + "mz": "^2.7.0", + "pirates": "^4.0.1", + "tinyglobby": "^0.2.11", + "ts-interface-checker": "^0.1.9" + }, + "bin": { + "sucrase": "bin/sucrase", + "sucrase-node": "bin/sucrase-node" + }, + "engines": { + "node": ">=16 || 14 >=14.17" + } + }, + "node_modules/supports-preserve-symlinks-flag": { + "version": "1.0.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/supports-preserve-symlinks-flag/-/supports-preserve-symlinks-flag-1.0.0.tgz", + "integrity": "sha512-ot0WnXS9fgdkgIcePe6RHNk1WA8+muPa6cSjeR3V8K27q9BB1rTE3R1p7Hv0z1ZyAc8s6Vvv8DIyWf681MAt0w==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">= 0.4" + }, + "funding": { + "url": "https://github.com/sponsors/ljharb" + } + }, + "node_modules/tailwindcss": { + "version": "3.4.18", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/tailwindcss/-/tailwindcss-3.4.18.tgz", + "integrity": "sha512-6A2rnmW5xZMdw11LYjhcI5846rt9pbLSabY5XPxo+XWdxwZaFEn47Go4NzFiHu9sNNmr/kXivP1vStfvMaK1GQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "@alloc/quick-lru": "^5.2.0", + "arg": "^5.0.2", + "chokidar": "^3.6.0", + "didyoumean": "^1.2.2", + "dlv": "^1.1.3", + "fast-glob": "^3.3.2", + "glob-parent": "^6.0.2", + "is-glob": "^4.0.3", + "jiti": "^1.21.7", + "lilconfig": "^3.1.3", + "micromatch": "^4.0.8", + "normalize-path": "^3.0.0", + "object-hash": "^3.0.0", + "picocolors": "^1.1.1", + "postcss": "^8.4.47", + "postcss-import": "^15.1.0", + "postcss-js": "^4.0.1", + "postcss-load-config": "^4.0.2 || ^5.0 || ^6.0", + "postcss-nested": "^6.2.0", + "postcss-selector-parser": "^6.1.2", + "resolve": "^1.22.8", + "sucrase": "^3.35.0" + }, + "bin": { + "tailwind": "lib/cli.js", + "tailwindcss": "lib/cli.js" + }, + "engines": { + "node": ">=14.0.0" + } + }, + "node_modules/thenify": { + "version": "3.3.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/thenify/-/thenify-3.3.1.tgz", + "integrity": "sha512-RVZSIV5IG10Hk3enotrhvz0T9em6cyHBLkH/YAZuKqd8hRkKhSfCGIcP2KUY0EPxndzANBmNllzWPwak+bheSw==", + "dev": true, + "license": "MIT", + "dependencies": { + "any-promise": "^1.0.0" + } + }, + "node_modules/thenify-all": { + "version": "1.6.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/thenify-all/-/thenify-all-1.6.0.tgz", + "integrity": "sha512-RNxQH/qI8/t3thXJDwcstUO4zeqo64+Uy/+sNVRBx4Xn2OX+OZ9oP+iJnNFqplFra2ZUVeKCSa2oVWi3T4uVmA==", + "dev": true, + "license": "MIT", + "dependencies": { + "thenify": ">= 3.1.0 < 4" + }, + "engines": { + "node": ">=0.8" + } + }, + "node_modules/tinyglobby": { + "version": "0.2.15", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/tinyglobby/-/tinyglobby-0.2.15.tgz", + "integrity": "sha512-j2Zq4NyQYG5XMST4cbs02Ak8iJUdxRM0XI5QyxXuZOzKOINmWurp3smXu3y5wDcJrptwpSjgXHzIQxR0omXljQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "fdir": "^6.5.0", + "picomatch": "^4.0.3" + }, + "engines": { + "node": ">=12.0.0" + }, + "funding": { + "url": "https://github.com/sponsors/SuperchupuDev" + } + }, + "node_modules/tinyglobby/node_modules/fdir": { + "version": "6.5.0", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/fdir/-/fdir-6.5.0.tgz", + "integrity": "sha512-tIbYtZbucOs0BRGqPJkshJUYdL+SDH7dVM8gjy+ERp3WAUjLEFJE+02kanyHtwjWOnwrKYBiwAmM0p4kLJAnXg==", + "dev": true, + "license": "MIT", + "engines": { + "node": ">=12.0.0" + }, + "peerDependencies": { + "picomatch": "^3 || ^4" + }, + "peerDependenciesMeta": { + "picomatch": { + "optional": true + } + } + }, + "node_modules/tinyglobby/node_modules/picomatch": { + "version": "4.0.3", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/picomatch/-/picomatch-4.0.3.tgz", + "integrity": "sha512-5gTmgEY/sqK6gFXLIsQNH19lWb4ebPDLA4SdLP7dsWkIXHWlG66oPuVvXSGFPppYZz8ZDZq0dYYrbHfBCVUb1Q==", + "dev": true, + "license": "MIT", + "peer": true, + "engines": { + "node": ">=12" + }, + "funding": { + "url": "https://github.com/sponsors/jonschlinkert" + } + }, + "node_modules/to-regex-range": { + "version": "5.0.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/to-regex-range/-/to-regex-range-5.0.1.tgz", + "integrity": "sha512-65P7iz6X5yEr1cwcgvQxbbIw7Uk3gOy5dIdtZ4rDveLqhrdJP+Li/Hx6tyK0NEb+2GCyneCMJiGqrADCSNk8sQ==", + "dev": true, + "license": "MIT", + "dependencies": { + "is-number": "^7.0.0" + }, + "engines": { + "node": ">=8.0" + } + }, + "node_modules/ts-interface-checker": { + "version": "0.1.13", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/ts-interface-checker/-/ts-interface-checker-0.1.13.tgz", + "integrity": "sha512-Y/arvbn+rrz3JCKl9C4kVNfTfSm2/mEp5FSz5EsZSANGPSlQrpRI5M4PKF+mJnE52jOO90PnPSc3Ur3bTQw0gA==", + "dev": true, + "license": "Apache-2.0" + }, + "node_modules/update-browserslist-db": { + "version": "1.1.4", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/update-browserslist-db/-/update-browserslist-db-1.1.4.tgz", + "integrity": "sha512-q0SPT4xyU84saUX+tomz1WLkxUbuaJnR1xWt17M7fJtEJigJeWUNGUqrauFXsHnqev9y9JTRGwk13tFBuKby4A==", + "dev": true, + "funding": [ + { + "type": "opencollective", + "url": "https://opencollective.com/browserslist" + }, + { + "type": "tidelift", + "url": "https://tidelift.com/funding/github/npm/browserslist" + }, + { + "type": "github", + "url": "https://github.com/sponsors/ai" + } + ], + "license": "MIT", + "dependencies": { + "escalade": "^3.2.0", + "picocolors": "^1.1.1" + }, + "bin": { + "update-browserslist-db": "cli.js" + }, + "peerDependencies": { + "browserslist": ">= 4.21.0" + } + }, + "node_modules/util-deprecate": { + "version": "1.0.2", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/util-deprecate/-/util-deprecate-1.0.2.tgz", + "integrity": "sha512-EPD5q1uXyFxJpCrLnCc1nHnq3gOa6DZBocAIiI2TaSCA7VCJ1UJDMagCzIkXNsUYfD1daK//LTEQ8xiIbrHtcw==", + "dev": true, + "license": "MIT" + }, + "node_modules/vite": { + "version": "5.4.21", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/vite/-/vite-5.4.21.tgz", + "integrity": "sha512-o5a9xKjbtuhY6Bi5S3+HvbRERmouabWbyUcpXXUA1u+GNUKoROi9byOJ8M0nHbHYHkYICiMlqxkg1KkYmm25Sw==", + "dev": true, + "license": "MIT", + "peer": true, + "dependencies": { + "esbuild": "^0.21.3", + "postcss": "^8.4.43", + "rollup": "^4.20.0" + }, + "bin": { + "vite": "bin/vite.js" + }, + "engines": { + "node": "^18.0.0 || >=20.0.0" + }, + "funding": { + "url": "https://github.com/vitejs/vite?sponsor=1" + }, + "optionalDependencies": { + "fsevents": "~2.3.3" + }, + "peerDependencies": { + "@types/node": "^18.0.0 || >=20.0.0", + "less": "*", + "lightningcss": "^1.21.0", + "sass": "*", + "sass-embedded": "*", + "stylus": "*", + "sugarss": "*", + "terser": "^5.4.0" + }, + "peerDependenciesMeta": { + "@types/node": { + "optional": true + }, + "less": { + "optional": true + }, + "lightningcss": { + "optional": true + }, + "sass": { + "optional": true + }, + "sass-embedded": { + "optional": true + }, + "stylus": { + "optional": true + }, + "sugarss": { + "optional": true + }, + "terser": { + "optional": true + } + } + }, + "node_modules/yallist": { + "version": "3.1.1", + "resolved": "https://mirrors.huaweicloud.com/repository/npm/yallist/-/yallist-3.1.1.tgz", + "integrity": "sha512-a4UGQaWPH59mOXUYnAG2ewncQS4i4F43Tv3JoAM+s2VDAmS9NsK8GpDMLrCHPksFT7h3K6TOoUNn2pb7RoXx4g==", + "dev": true, + "license": "ISC" + } + } +} diff --git a/frontend/package.json b/frontend/package.json new file mode 100644 index 0000000..7f0b131 --- /dev/null +++ b/frontend/package.json @@ -0,0 +1,26 @@ +{ + "name": "medical-report-analyzer", + "private": true, + "version": "1.0.0", + "type": "module", + "scripts": { + "dev": "vite", + "build": "vite build", + "preview": "vite preview" + }, + "dependencies": { + "react": "^18.2.0", + "react-dom": "^18.2.0", + "axios": "^1.6.0", + "lucide-react": "^0.292.0" + }, + "devDependencies": { + "@types/react": "^18.2.37", + "@types/react-dom": "^18.2.15", + "@vitejs/plugin-react": "^4.2.0", + "autoprefixer": "^10.4.16", + "postcss": "^8.4.31", + "tailwindcss": "^3.3.5", + "vite": "^5.0.0" + } +} diff --git a/frontend/postcss.config.js b/frontend/postcss.config.js new file mode 100644 index 0000000..2e7af2b --- /dev/null +++ b/frontend/postcss.config.js @@ -0,0 +1,6 @@ +export default { + plugins: { + tailwindcss: {}, + autoprefixer: {}, + }, +} diff --git a/frontend/src/App.jsx b/frontend/src/App.jsx new file mode 100644 index 0000000..59d8aed --- /dev/null +++ b/frontend/src/App.jsx @@ -0,0 +1,261 @@ +import React, { useState, useEffect } from 'react' +import { FileText, Search, Brain, Check, AlertCircle, Trash2, Layers } from 'lucide-react' +import FileUpload from './components/FileUpload' +import ReportList from './components/ReportList' +import ReportDetail from './components/ReportDetail' +import IntegrationResult from './components/IntegrationResult' +import BatchUpload from './components/BatchUpload' +import { api } from './services/api' + +function App() { + const [reports, setReports] = useState([]) + const [selectedReport, setSelectedReport] = useState(null) + const [integrationResult, setIntegrationResult] = useState(null) + const [selectedReports, setSelectedReports] = useState([]) + const [loading, setLoading] = useState(false) + const [error, setError] = useState(null) + const [activeTab, setActiveTab] = useState('batch') // 默认使用批量模式 + + const loadReports = async () => { + try { + const data = await api.getReports() + setReports(data.reports || []) + } catch (err) { + setError('加载报告列表失败') + } + } + + const handleFileUpload = async (file) => { + try { + setLoading(true) + setError(null) + const result = await api.uploadFile(file) + await loadReports() + return result + } catch (err) { + setError('文件上传失败: ' + err.message) + throw err + } finally { + setLoading(false) + } + } + + const handleOCR = async (fileId) => { + try { + setLoading(true) + setError(null) + await api.performOCR(fileId) + await loadReports() + } catch (err) { + setError('OCR识别失败: ' + err.message) + } finally { + setLoading(false) + } + } + + const handleAnalyze = async (fileId) => { + try { + setLoading(true) + setError(null) + await api.analyzeReport(fileId) + await loadReports() + const detail = await api.getReportDetail(fileId) + setSelectedReport(detail.report) + } catch (err) { + setError('报告分析失败: ' + err.message) + } finally { + setLoading(false) + } + } + + const handleIntegrate = async () => { + if (selectedReports.length === 0) { + setError('请至少选择一份报告进行整合') + return + } + + try { + setLoading(true) + setError(null) + const result = await api.integrateReports(selectedReports) + setIntegrationResult(result) + } catch (err) { + setError('报告整合失败: ' + err.message) + } finally { + setLoading(false) + } + } + + const handleDelete = async (fileId) => { + if (!confirm('确定要删除这份报告吗?')) return + + try { + setLoading(true) + await api.deleteReport(fileId) + await loadReports() + if (selectedReport?.id === fileId) { + setSelectedReport(null) + } + setSelectedReports(prev => prev.filter(id => id !== fileId)) + } catch (err) { + setError('删除失败: ' + err.message) + } finally { + setLoading(false) + } + } + + const toggleReportSelection = (reportId) => { + setSelectedReports(prev => + prev.includes(reportId) + ? prev.filter(id => id !== reportId) + : [...prev, reportId] + ) + } + + const viewReportDetail = async (reportId) => { + try { + const detail = await api.getReportDetail(reportId) + setSelectedReport(detail.report) + setIntegrationResult(null) + } catch (err) { + setError('加载报告详情失败') + } + } + + const handleGeneratePDF = async (fileId) => { + try { + setLoading(true) + setError(null) + const result = await api.generatePDF(fileId) + // 生成成功后直接下载 + api.downloadPDF(fileId) + // 显示成功提示 + alert('PDF报告生成成功!') + } catch (err) { + setError('PDF生成失败: ' + err.message) + } finally { + setLoading(false) + } + } + + const handleBatchGenerate = async (files, patientName) => { + try { + setLoading(true) + setError(null) + const result = await api.generateComprehensiveReport(files, patientName) + // 生成成功后直接下载 + api.downloadComprehensiveReport(result.pdf_filename) + // 显示成功提示 + alert(`综合健康报告生成成功!\n患者:${result.patient_name}\n报告数量:${result.report_count} 份`) + } catch (err) { + setError('综合报告生成失败: ' + err.message) + } finally { + setLoading(false) + } + } + + return ( +
+ {/* Header */} +
+
+
+
+ +

医疗报告分析系统

+
+
+
+
+ 系统运行中 +
+
+
+
+
+ +
+ {/* Error Alert */} + {error && ( +
+ +
+

错误

+

{error}

+
+ +
+ )} + +
+ {/* Left Panel - Upload & Reports */} +
+ {/* 选项卡切换 */} +
+ +
+ + {/* File Upload */} + + + {/* Reports List - 只在单个上传模式显示 */} + {false && activeTab === 'single' && ( + + )} + + {/* Integration Button */} + {false && reports.length > 0 && ( +
+ +
+ )} +
+ + {/* Right Panel - Details */} +
+ {integrationResult ? ( + setIntegrationResult(null)} /> + ) : selectedReport ? ( + setSelectedReport(null)} + onGeneratePDF={handleGeneratePDF} + /> + ) : ( +
+ +

上传医疗报告开始分析

+

支持 JPG、PNG、PDF 格式

+
+ )} +
+
+
+
+ ) +} + +export default App diff --git a/frontend/src/components/BatchUpload.jsx b/frontend/src/components/BatchUpload.jsx new file mode 100644 index 0000000..599af2c --- /dev/null +++ b/frontend/src/components/BatchUpload.jsx @@ -0,0 +1,191 @@ +import React, { useState } from 'react' +import { Upload, X, FileText, User, Loader } from 'lucide-react' + +function BatchUpload({ onGenerate, loading }) { + const [selectedFiles, setSelectedFiles] = useState([]) + const [patientName, setPatientName] = useState('') + const [dragActive, setDragActive] = useState(false) + + const handleFileChange = (e) => { + const files = Array.from(e.target.files) + addFiles(files) + } + + const addFiles = (files) => { + const validFiles = files.filter(file => { + const ext = file.name.toLowerCase() + return ext.endsWith('.pdf') || ext.endsWith('.jpg') || + ext.endsWith('.jpeg') || ext.endsWith('.png') + }) + + setSelectedFiles(prev => [...prev, ...validFiles]) + } + + const removeFile = (index) => { + setSelectedFiles(prev => prev.filter((_, i) => i !== index)) + } + + const handleDrag = (e) => { + e.preventDefault() + e.stopPropagation() + if (e.type === 'dragenter' || e.type === 'dragover') { + setDragActive(true) + } else if (e.type === 'dragleave') { + setDragActive(false) + } + } + + const handleDrop = (e) => { + e.preventDefault() + e.stopPropagation() + setDragActive(false) + + const files = Array.from(e.dataTransfer.files) + addFiles(files) + } + + const handleSubmit = async () => { + if (selectedFiles.length === 0) { + alert('请至少选择一个文件') + return + } + + if (!patientName.trim()) { + alert('请输入患者姓名') + return + } + + await onGenerate(selectedFiles, patientName.trim()) + } + + const formatFileSize = (bytes) => { + if (bytes === 0) return '0 Bytes' + const k = 1024 + const sizes = ['Bytes', 'KB', 'MB', 'GB'] + const i = Math.floor(Math.log(bytes) / Math.log(k)) + return Math.round(bytes / Math.pow(k, i) * 100) / 100 + ' ' + sizes[i] + } + + return ( +
+

+ + 批量上传生成综合报告 +

+ +

+ 上传多个检测报告(血常规、尿常规、生化检查等),系统将自动识别并生成综合健康报告 +

+ + {/* 患者姓名输入 */} +
+ + setPatientName(e.target.value)} + placeholder="请输入患者姓名" + className="w-full px-4 py-2 border border-gray-300 rounded-lg focus:ring-2 focus:ring-indigo-500 focus:border-transparent" + disabled={loading} + /> +
+ + {/* 文件上传区域 */} +
+ +

+ 拖拽文件到此处,或 + +

+

支持 PDF、JPG、PNG 格式,可多选

+
+ + {/* 已选文件列表 */} + {selectedFiles.length > 0 && ( +
+

+ 已选择 {selectedFiles.length} 个文件 +

+
+ {selectedFiles.map((file, index) => ( +
+
+ +
+

+ {file.name} +

+

+ {formatFileSize(file.size)} +

+
+
+ +
+ ))} +
+
+ )} + + {/* 生成按钮 */} + + + {/* 说明 */} +
+

+ 注意:上传的文件仅用于生成报告,处理完成后会自动删除,不会永久保存在系统中。 +

+
+
+ ) +} + +export default BatchUpload diff --git a/frontend/src/components/FileUpload.jsx b/frontend/src/components/FileUpload.jsx new file mode 100644 index 0000000..3f0649c --- /dev/null +++ b/frontend/src/components/FileUpload.jsx @@ -0,0 +1,91 @@ +import React, { useState } from 'react' +import { Upload, Loader } from 'lucide-react' + +function FileUpload({ onUpload, loading }) { + const [dragging, setDragging] = useState(false) + + const handleDragOver = (e) => { + e.preventDefault() + setDragging(true) + } + + const handleDragLeave = () => { + setDragging(false) + } + + const handleDrop = (e) => { + e.preventDefault() + setDragging(false) + const files = Array.from(e.dataTransfer.files) + if (files.length > 0) { + handleFileSelect(files[0]) + } + } + + const handleFileSelect = async (file) => { + if (!file) return + + const allowedTypes = ['image/jpeg', 'image/png', 'image/jpg', 'application/pdf'] + if (!allowedTypes.includes(file.type)) { + alert('不支持的文件格式,请上传 JPG、PNG 或 PDF 文件') + return + } + + try { + await onUpload(file) + } catch (err) { + console.error('Upload failed:', err) + } + } + + return ( +
+

+ + 上传医疗报告 +

+ +
+ {loading ? ( +
+ +

上传中...

+
+ ) : ( + <> + +

拖拽文件到此处或点击上传

+

支持 JPG、PNG、PDF 格式

+ + + )} +
+
+ ) +} + +export default FileUpload diff --git a/frontend/src/components/IntegrationResult.jsx b/frontend/src/components/IntegrationResult.jsx new file mode 100644 index 0000000..023cab9 --- /dev/null +++ b/frontend/src/components/IntegrationResult.jsx @@ -0,0 +1,161 @@ +import React from 'react' +import { X, Brain, TrendingUp, AlertTriangle, Lightbulb, Calendar } from 'lucide-react' + +function IntegrationResult({ result, onClose }) { + const getSeverityColor = (severity) => { + switch (severity) { + case '高': return 'bg-red-100 border-red-300 text-red-900' + case '中': return 'bg-yellow-100 border-yellow-300 text-yellow-900' + case '低': return 'bg-green-100 border-green-300 text-green-900' + default: return 'bg-gray-100 border-gray-300 text-gray-900' + } + } + + return ( +
+ {/* Header */} +
+
+

+ + 综合分析报告 +

+

+ 整合了 {result.report_count || result.reports_included?.length || 0} 份医疗报告 +

+
+ +
+ + {/* Content */} +
+ {/* Overall Summary */} +
+

整体健康状况

+

+ {result.integrated_analysis?.overall_summary || result.overall_summary || '暂无摘要'} +

+
+ + {/* Health Trends */} + {(result.integrated_analysis?.health_trends || result.health_trends) && ( +
+

+ + 健康趋势 +

+
+ {(result.integrated_analysis?.health_trends || result.health_trends).map((trend, index) => ( +
+

{trend}

+
+ ))} +
+
+ )} + + {/* Priority Concerns */} + {(result.integrated_analysis?.priority_concerns || result.priority_concerns) && + (result.integrated_analysis?.priority_concerns || result.priority_concerns).length > 0 && ( +
+

+ + 优先关注事项 +

+
+ {(result.integrated_analysis?.priority_concerns || result.priority_concerns).map((concern, index) => ( +
+
+

{concern.concern}

+ + {concern.severity} + +
+

{concern.description}

+
+ ))} +
+
+ )} + + {/* Comprehensive Assessment */} + {(result.integrated_analysis?.comprehensive_assessment || result.comprehensive_assessment) && ( +
+

综合评估

+

+ {result.integrated_analysis?.comprehensive_assessment || result.comprehensive_assessment} +

+
+ )} + + {/* Integrated Recommendations */} + {(result.integrated_analysis?.integrated_recommendations || result.integrated_recommendations) && ( +
+

+ + 综合建议 +

+
    + {(result.integrated_analysis?.integrated_recommendations || result.integrated_recommendations).map((rec, index) => ( +
  • + + {rec} +
  • + ))} +
+
+ )} + + {/* Follow-up Suggestions */} + {(result.integrated_analysis?.follow_up_suggestions || result.follow_up_suggestions) && ( +
+

+ + 后续跟踪建议 +

+
    + {(result.integrated_analysis?.follow_up_suggestions || result.follow_up_suggestions).map((suggestion, index) => ( +
  • + + {suggestion} +
  • + ))} +
+
+ )} + + {/* Reports Included */} + {result.reports_included && result.reports_included.length > 0 && ( +
+

包含的报告

+
+ {result.reports_included.map((report, index) => ( +
+ {index + 1}. + {report.filename} +
+ ))} +
+
+ )} + + {/* Note */} + {result.note && ( +
+

{result.note}

+
+ )} +
+
+ ) +} + +export default IntegrationResult diff --git a/frontend/src/components/ReportDetail.jsx b/frontend/src/components/ReportDetail.jsx new file mode 100644 index 0000000..4646b85 --- /dev/null +++ b/frontend/src/components/ReportDetail.jsx @@ -0,0 +1,184 @@ +import React, { useState } from 'react' +import { X, FileText, AlertTriangle, CheckCircle, Activity, Download } from 'lucide-react' + +function ReportDetail({ report, onClose, onGeneratePDF }) { + const [generatingPDF, setGeneratingPDF] = useState(false) + + const handleGeneratePDF = async () => { + setGeneratingPDF(true) + try { + await onGeneratePDF(report.id) + } finally { + setGeneratingPDF(false) + } + } + const analysis = report.analysis || {} + + // 兼容不同结构的返回结果 + const keyFindings = Array.isArray(analysis.key_findings) + ? analysis.key_findings.map((item) => + typeof item === 'string' + ? item + : item?.finding || item?.text || JSON.stringify(item) + ) + : [] + + const abnormalItems = Array.isArray(analysis.abnormal_items) + ? analysis.abnormal_items + : [] + + let riskAssessmentText = '' + if (typeof analysis.risk_assessment === 'string') { + riskAssessmentText = analysis.risk_assessment + } else if (analysis.risk_assessment && typeof analysis.risk_assessment === 'object') { + const ra = analysis.risk_assessment + const parts = [] + if (Array.isArray(ra.high_risk) && ra.high_risk.length > 0) { + parts.push(`【高风险】${ra.high_risk.join(';')}`) + } + if (Array.isArray(ra.medium_risk) && ra.medium_risk.length > 0) { + parts.push(`【中风险】${ra.medium_risk.join(';')}`) + } + if (Array.isArray(ra.low_risk) && ra.low_risk.length > 0) { + parts.push(`【低风险】${ra.low_risk.join(';')}`) + } + riskAssessmentText = parts.join('\n') + } + + const recommendations = Array.isArray(analysis.recommendations) + ? analysis.recommendations.map((item) => + typeof item === 'string' + ? item + : item?.recommendation || item?.text || JSON.stringify(item) + ) + : [] + + return ( +
+ {/* Header */} +
+
+

+ + 报告详情 +

+

{report.filename}

+
+
+ + +
+
+ + {/* Content */} +
+ {/* Summary */} +
+

+ + 摘要 +

+

{analysis.summary || '暂无摘要'}

+
+ + {/* Key Findings */} + {keyFindings.length > 0 && ( +
+

+ + 关键发现 +

+
    + {keyFindings.map((finding, index) => ( +
  • + + {finding} +
  • + ))} +
+
+ )} + + {/* Abnormal Items */} + {abnormalItems.length > 0 && ( +
+

+ + 异常指标 +

+
+ {abnormalItems.map((item, index) => ( +
+

{item.name || item}

+ {item.value && ( +

+ 测量值: {item.value} {item.reference && `(参考: ${item.reference})`} +

+ )} +
+ ))} +
+
+ )} + + {/* Risk Assessment */} + {riskAssessmentText && ( +
+

风险评估

+
+

{riskAssessmentText}

+
+
+ )} + + {/* Recommendations */} + {recommendations.length > 0 && ( +
+

建议

+
    + {recommendations.map((rec, index) => ( +
  • + + {rec} +
  • + ))} +
+
+ )} + + {/* OCR Text */} + {report.ocr_text && ( +
+

原始文本

+
+
+                {report.ocr_text}
+              
+
+
+ )} + + {/* Note */} + {analysis.note && ( +
+

{analysis.note}

+
+ )} +
+
+ ) +} + +export default ReportDetail diff --git a/frontend/src/components/ReportList.jsx b/frontend/src/components/ReportList.jsx new file mode 100644 index 0000000..d059f47 --- /dev/null +++ b/frontend/src/components/ReportList.jsx @@ -0,0 +1,113 @@ +import React from 'react' +import { FileText, Search, Brain, Trash2, CheckCircle, Clock, AlertCircle } from 'lucide-react' + +function ReportList({ reports, selectedReports, onToggleSelect, onView, onOCR, onAnalyze, onDelete, loading }) { + const getStatusIcon = (report) => { + if (report.has_analysis) return + if (report.has_ocr) return + return + } + + const getStatusText = (report) => { + if (report.has_analysis) return '已分析' + if (report.has_ocr) return '已识别' + return '已上传' + } + + return ( +
+

+ 报告列表 ({reports.length}) +

+ + {reports.length === 0 ? ( +
+ +

暂无报告

+
+ ) : ( +
+ {reports.map((report) => ( +
+
+ onToggleSelect(report.id)} + className="mt-1 w-4 h-4 text-indigo-600 rounded focus:ring-indigo-500" + disabled={!report.has_analysis} + /> + +
+
+ +

+ {report.filename} +

+
+ +
+ {getStatusIcon(report)} + {getStatusText(report)} +
+ +
+ {!report.has_ocr && ( + + )} + + {report.has_ocr && !report.has_analysis && ( + + )} + + {report.has_analysis && ( + + )} + + +
+
+
+
+ ))} +
+ )} +
+ ) +} + +export default ReportList diff --git a/frontend/src/index.css b/frontend/src/index.css new file mode 100644 index 0000000..5e4b162 --- /dev/null +++ b/frontend/src/index.css @@ -0,0 +1,16 @@ +@tailwind base; +@tailwind components; +@tailwind utilities; + +body { + margin: 0; + font-family: -apple-system, BlinkMacSystemFont, 'Segoe UI', 'Roboto', 'Oxygen', + 'Ubuntu', 'Cantarell', 'Fira Sans', 'Droid Sans', 'Helvetica Neue', + sans-serif; + -webkit-font-smoothing: antialiased; + -moz-osx-font-smoothing: grayscale; +} + +code { + font-family: source-code-pro, Menlo, Monaco, Consolas, 'Courier New', monospace; +} diff --git a/frontend/src/main.jsx b/frontend/src/main.jsx new file mode 100644 index 0000000..54b39dd --- /dev/null +++ b/frontend/src/main.jsx @@ -0,0 +1,10 @@ +import React from 'react' +import ReactDOM from 'react-dom/client' +import App from './App.jsx' +import './index.css' + +ReactDOM.createRoot(document.getElementById('root')).render( + + + , +) diff --git a/frontend/src/services/api.js b/frontend/src/services/api.js new file mode 100644 index 0000000..c56eba9 --- /dev/null +++ b/frontend/src/services/api.js @@ -0,0 +1,96 @@ +import axios from 'axios' + +const API_BASE_URL = 'http://localhost:8001' + +const apiClient = axios.create({ + baseURL: API_BASE_URL, + headers: { + 'Content-Type': 'application/json', + }, +}) + +export const api = { + // 上传文件 + uploadFile: async (file) => { + const formData = new FormData() + formData.append('file', file) + const response = await axios.post(`${API_BASE_URL}/api/upload`, formData, { + headers: { 'Content-Type': 'multipart/form-data' } + }) + return response.data + }, + + // 执行OCR + performOCR: async (fileId) => { + const response = await apiClient.post(`/api/ocr/${fileId}`) + return response.data + }, + + // 分析报告 + analyzeReport: async (fileId) => { + const response = await apiClient.post(`/api/analyze/${fileId}`) + return response.data + }, + + // 整合报告 + integrateReports: async (fileIds) => { + const response = await apiClient.post('/api/integrate', fileIds) + return response.data + }, + + // 获取所有报告 + getReports: async () => { + const response = await apiClient.get('/api/reports') + return response.data + }, + + // 获取报告详情 + getReportDetail: async (fileId) => { + const response = await apiClient.get(`/api/report/${fileId}`) + return response.data + }, + + // 删除报告 + deleteReport: async (fileId) => { + const response = await apiClient.delete(`/api/report/${fileId}`) + return response.data + }, + + // 生成PDF报告 + generatePDF: async (fileId) => { + const response = await apiClient.post(`/api/report/${fileId}/pdf`) + return response.data + }, + + // 下载PDF报告 + downloadPDF: (fileId) => { + window.open(`${API_BASE_URL}/api/report/${fileId}/pdf/download`, '_blank') + }, + + // 批量上传并生成综合报告 + generateComprehensiveReport: async (files, patientName = '患者') => { + const formData = new FormData() + + // 添加所有文件 + files.forEach(file => { + formData.append('files', file) + }) + + // 添加患者姓名 + formData.append('patient_name', patientName) + + const response = await axios.post( + `${API_BASE_URL}/api/comprehensive-report`, + formData, + { + headers: { 'Content-Type': 'multipart/form-data' } + } + ) + return response.data + }, + + // 下载综合报告 + downloadComprehensiveReport: (pdfFilename) => { + window.open(`${API_BASE_URL}/api/comprehensive-report/download/${pdfFilename}`, '_blank') + }, +} diff --git a/frontend/tailwind.config.js b/frontend/tailwind.config.js new file mode 100644 index 0000000..dca8ba0 --- /dev/null +++ b/frontend/tailwind.config.js @@ -0,0 +1,11 @@ +/** @type {import('tailwindcss').Config} */ +export default { + content: [ + "./index.html", + "./src/**/*.{js,ts,jsx,tsx}", + ], + theme: { + extend: {}, + }, + plugins: [], +} diff --git a/frontend/vite.config.js b/frontend/vite.config.js new file mode 100644 index 0000000..351b16e --- /dev/null +++ b/frontend/vite.config.js @@ -0,0 +1,15 @@ +import { defineConfig } from 'vite' +import react from '@vitejs/plugin-react' + +export default defineConfig({ + plugins: [react()], + server: { + port: 5173, + proxy: { + '/api': { + target: 'http://localhost:8001', + changeOrigin: true + } + } + } +}) diff --git a/template_docxtpl.docx b/template_docxtpl.docx new file mode 100644 index 0000000..317f4ad Binary files /dev/null and b/template_docxtpl.docx differ diff --git a/医疗干预模板(1).docx b/医疗干预模板(1).docx new file mode 100644 index 0000000..f85e1f3 Binary files /dev/null and b/医疗干预模板(1).docx differ diff --git a/检查项目分类.md b/检查项目分类.md new file mode 100644 index 0000000..602a26e --- /dev/null +++ b/检查项目分类.md @@ -0,0 +1,94 @@ +# 医疗检查项目分类 + +## 一、肝功能 (Liver Function) + +| 缩写 | 英文名称 | 中文名称 | +|------|----------|----------| +| ALT | Alanine Aminotransferase | 丙氨酸氨基转移酶 | +| AST | Aspartate Aminotransferase | 天门冬氨酸氨基转移酶 | +| GGT | Gamma-Glutamyl Transferase | 谷氨酰转肽酶 | +| ALP | Alkaline Phosphatase | 碱性磷酸酶 | +| TBIL | Total Bilirubin | 总胆红素 | +| DBIL | Direct Bilirubin | 直接胆红素 | +| TP | Total Protein | 总蛋白 | +| ALB | Albumin | 白蛋白 | +| GLB | Globulin | 球蛋白 | +| LDH | Lactate Dehydrogenase | 乳酸脱氢酶 | + +--- + +## 二、肾功能 (Kidney Function) + +| 缩写 | 英文名称 | 中文名称 | +|------|----------|----------| +| BUN | Blood Urea Nitrogen | 尿素氮 | +| CREA | Creatinine | 肌酐 | +| UA | Uric Acid | 尿酸 | +| CysC | Cystatin C | 胱抑素C | +| eGFR | Estimated GFR | 估算肾小球滤过率 | + +--- + +## 三、血常规 (Complete Blood Count) + +### 红细胞系列 + +| 缩写 | 英文名称 | 中文名称 | +|------|----------|----------| +| RBC | RBC count | 红细胞计数 | +| Hb | Hemoglobin | 血红蛋白 | +| HCT | Hematocrit | 红细胞压积 | +| MCV | Mean Corpuscular Volume | 平均红细胞体积 | +| MCH | Mean Corpuscular Hemoglobin | 平均红细胞血红蛋白含量 | +| MCHC | Mean Corpuscular Hemoglobin Concentration | 平均红细胞血红蛋白浓度 | +| RDW | Red Cell Distribution Width | 红细胞体积分布宽度 | + +### 血小板系列 + +| 缩写 | 英文名称 | 中文名称 | +|------|----------|----------| +| PLT | Platelet Count | 血小板计数 | +| MPV | Mean Platelet Volume | 平均血小板体积 | + +### 白细胞系列 + +| 缩写 | 英文名称 | 中文名称 | +|------|----------|----------| +| WBC | White Blood Cell Count | 白细胞计数 | +| NEUT | Neutrophil | 中性粒细胞 | +| NEUT% | Neutrophil % | 中性粒细胞百分比 | +| EOS | Eosinophil | 嗜酸细胞 | +| EOS% | Eosinophil % | 嗜酸细胞百分比 | +| BAS | Basophil | 嗜碱细胞 | +| BAS% | Basophil % | 嗜碱细胞百分比 | +| LYMPH | Lymphocyte | 淋巴细胞 | +| LYMPH% | Lymphocyte % | 淋巴细胞百分比 | +| MONO | Monocyte | 单核细胞 | +| MONO% | Monocyte % | 单核细胞百分比 | + +--- + +## 四、尿常规 (Urine Detection) + +| 缩写 | 英文名称 | 中文名称 | +|------|----------|----------| +| Color | Color | 颜色 | +| Clarity | Clarity | 透明度 | +| pH | pH | 酸碱度 | +| TUR | Turbidity | 浊度 | +| SG | Specific Gravity | 比重 | +| PRO | Protein | 蛋白质 | +| GLU | Glucose | 糖 | +| BLD/ERY | Occult Blood/RBC | 隐血或红细胞 | +| ERY | Erythrocyte | 红细胞 | +| WBC | White Blood Cell | 白细胞 | +| LEU | Leucocyte | 白细胞(尿) | +| NIT | Nitrite | 亚硝酸盐 | +| KET | Ketone | 酮体 | +| BIL | Bilirubin | 胆红素 | +| URO | Urobilinogen | 尿胆原 | +| CRY | Crystal | 结晶 | +| SEC | Squamous Epithelial Cells | 鳞状上皮细胞 | + +--- +