Files
schoolNews/schoolNewsServ/ai/Dify知识库指定方案.md
2025-11-04 18:49:37 +08:00

589 lines
17 KiB
Markdown
Raw Blame History

This file contains ambiguous Unicode characters

This file contains Unicode characters that might be confused with other characters. If you think that this is intentional, you can safely ignore this warning. Use the Escape button to reveal them.

# Dify指定知识库实现方案
## 🎯 问题分析
**需求**在调用Dify对话API时动态指定使用哪些知识库Dataset
**场景**:不同部门用户使用同一个智能体,但只能访问各自授权的知识库
---
## 📋 Dify官方支持的方式
### 方式1知识库检索API + LLM组合推荐⭐⭐
**优点**
- ✅ 完全控制知识库选择
- ✅ 可以实现复杂的权限逻辑
- ✅ 灵活性最高
**缺点**
- ⚠️ 需要自己组合API调用
- ⚠️ 无法使用Dify的完整对话管理功能
#### 步骤1检索相关知识
```http
POST https://api.dify.ai/v1/datasets/{dataset_id}/retrieve
Authorization: Bearer {api_key}
Content-Type: application/json
{
"query": "如何申请奖学金?",
"top_k": 3,
"score_threshold": 0.7
}
```
**响应示例:**
```json
{
"records": [
{
"content": "申请奖学金需要满足以下条件...",
"score": 0.95,
"metadata": {
"document_id": "doc-123",
"document_name": "奖学金管理办法.pdf"
}
}
]
}
```
#### 步骤2多知识库并行检索
```java
@Service
public class DifyKnowledgeService {
/**
* 从多个知识库检索相关内容
*/
public List<RetrievalRecord> retrieveFromMultipleDatasets(
String query,
List<String> datasetIds,
int topK) {
List<CompletableFuture<List<RetrievalRecord>>> futures = datasetIds.stream()
.map(datasetId -> CompletableFuture.supplyAsync(() ->
retrieveFromDataset(datasetId, query, topK)
))
.collect(Collectors.toList());
// 等待所有检索完成
CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).join();
// 合并结果并按分数排序
return futures.stream()
.map(CompletableFuture::join)
.flatMap(List::stream)
.sorted((a, b) -> Double.compare(b.getScore(), a.getScore()))
.limit(topK)
.collect(Collectors.toList());
}
/**
* 从单个知识库检索
*/
private List<RetrievalRecord> retrieveFromDataset(
String datasetId,
String query,
int topK) {
String url = difyConfig.getFullApiUrl("/datasets/" + datasetId + "/retrieve");
Map<String, Object> requestBody = new HashMap<>();
requestBody.put("query", query);
requestBody.put("top_k", topK);
requestBody.put("score_threshold", 0.7);
// HTTP请求
HttpResponse response = httpClient.post(url)
.header("Authorization", "Bearer " + apiKey)
.body(requestBody)
.execute();
return parseRetrievalResponse(response);
}
}
```
#### 步骤3组合上下文调用LLM
```java
/**
* 使用检索到的知识回答问题
*/
public String chatWithRetrievedKnowledge(
String query,
List<RetrievalRecord> records,
String conversationId) {
// 构建上下文
String context = records.stream()
.map(r -> "【" + r.getMetadata().get("document_name") + "】\n" + r.getContent())
.collect(Collectors.joining("\n\n"));
// 构建Prompt
String prompt = String.format(
"请基于以下知识库内容回答用户问题。如果知识库中没有相关信息,请明确告知用户。\n\n" +
"知识库内容:\n%s\n\n" +
"用户问题:%s\n\n" +
"回答:",
context, query
);
// 调用Dify Completion API或直接调用LLM
return callLLM(prompt, conversationId);
}
```
---
### 方式2Dify Workflow工作流
**原理**:创建工作流,使用变量控制知识库选择
#### Dify工作流配置
```yaml
workflow_nodes:
- id: start
type: start
outputs:
- query # 用户问题
- dataset_ids # 知识库ID列表变量
- id: kb_retrieval
type: knowledge-retrieval
inputs:
query: "{{#start.query#}}"
datasets: "{{#start.dataset_ids#}}" # 从输入变量读取
top_k: 3
outputs:
- result
- id: llm
type: llm
inputs:
prompt: |
基于以下知识库内容回答问题:
{{#kb_retrieval.result#}}
用户问题:{{#start.query#}}
outputs:
- answer
- id: end
type: end
outputs:
- answer: "{{#llm.answer#}}"
```
#### API调用示例
```java
/**
* 调用Dify Workflow
*/
public void chatWithWorkflow(
String query,
List<String> datasetIds,
String userId,
SseEmitter emitter) {
String url = difyConfig.getFullApiUrl("/workflows/run");
Map<String, Object> inputs = new HashMap<>();
inputs.put("query", query);
inputs.put("dataset_ids", datasetIds); // ⭐ 动态传入知识库列表
Map<String, Object> requestBody = new HashMap<>();
requestBody.put("inputs", inputs);
requestBody.put("response_mode", "streaming");
requestBody.put("user", userId);
// 流式请求
httpClient.postStream(url, requestBody, new StreamCallback() {
@Override
public void onChunk(String chunk) {
emitter.send(chunk);
}
});
}
```
**HTTP请求示例**
```http
POST /v1/workflows/run
Authorization: Bearer {api_key}
Content-Type: application/json
{
"inputs": {
"query": "如何申请奖学金?",
"dataset_ids": ["dataset-edu-001", "dataset-public-001"]
},
"response_mode": "streaming",
"user": "user-123"
}
```
---
### 方式3多应用切换不推荐
为不同部门创建不同的Dify应用
```
部门A -> App A绑定知识库A1, A2
部门B -> App B绑定知识库B1, B2
```
**缺点**
- ❌ 管理复杂
- ❌ 无法共享公共知识库
- ❌ 扩展性差
---
## 🎨 推荐实现方案
### 方案知识库检索API + 自定义LLM调用
#### 完整实现代码
```java
@Service
public class AiChatServiceImpl implements AiChatService {
@Autowired
private AiKnowledgeMapper knowledgeMapper;
@Autowired
private DifyApiClient difyApiClient;
@Autowired
private AiMessageMapper messageMapper;
/**
* 流式对话(带知识库权限隔离)
*/
@Override
public void streamChat(
String message,
String conversationId,
String userId,
SseEmitter emitter) {
try {
// 1. 获取当前登录用户的部门角色信息通过LoginUtil
List<UserDeptRoleVO> userDeptRoles = LoginUtil.getCurrentDeptRole();
// 2. 查询用户有权限的知识库(自动权限过滤✅)
TbAiKnowledge filter = new TbAiKnowledge();
filter.setStatus(1); // 只查询启用的
List<TbAiKnowledge> authorizedKnowledges =
knowledgeMapper.selectAiKnowledges(
filter,
userDeptRoles // 直接传入LoginUtil获取的用户权限信息
);
// 3. 提取Dify Dataset IDs
List<String> datasetIds = authorizedKnowledges.stream()
.map(TbAiKnowledge::getDifyDatasetId)
.filter(Objects::nonNull)
.collect(Collectors.toList());
if (datasetIds.isEmpty()) {
emitter.send("您当前没有可访问的知识库,无法进行对话。");
emitter.complete();
return;
}
// 4. 从多个知识库检索相关内容
List<RetrievalRecord> retrievalRecords =
difyApiClient.retrieveFromMultipleDatasets(
message,
datasetIds,
5 // Top K
);
// 5. 构建上下文
String context = buildContext(retrievalRecords, authorizedKnowledges);
// 6. 调用LLM流式对话
difyApiClient.streamChatWithContext(
message,
context,
conversationId,
userId,
new StreamCallback() {
private StringBuilder fullAnswer = new StringBuilder();
@Override
public void onChunk(String chunk) {
fullAnswer.append(chunk);
emitter.send(chunk);
}
@Override
public void onComplete() {
// 保存消息
saveMessages(
conversationId,
userId,
message,
fullAnswer.toString(),
retrievalRecords
);
emitter.complete();
}
@Override
public void onError(Throwable error) {
log.error("对话失败", error);
emitter.completeWithError(error);
}
}
);
} catch (Exception e) {
log.error("流式对话异常", e);
emitter.completeWithError(e);
}
}
/**
* 构建上下文
*/
private String buildContext(
List<RetrievalRecord> records,
List<TbAiKnowledge> knowledges) {
Map<String, String> knowledgeTitles = knowledges.stream()
.collect(Collectors.toMap(
TbAiKnowledge::getDifyDatasetId,
TbAiKnowledge::getTitle
));
return records.stream()
.map(r -> {
String datasetId = r.getDatasetId();
String knowledgeTitle = knowledgeTitles.getOrDefault(datasetId, "未知知识库");
return String.format(
"【来源:%s - %s】\n%s",
knowledgeTitle,
r.getDocumentName(),
r.getContent()
);
})
.collect(Collectors.joining("\n\n---\n\n"));
}
}
```
#### DifyApiClient实现
```java
@Component
public class DifyApiClient {
@Autowired
private DifyConfig difyConfig;
private final OkHttpClient httpClient;
public DifyApiClient(DifyConfig difyConfig) {
this.difyConfig = difyConfig;
this.httpClient = new OkHttpClient.Builder()
.connectTimeout(difyConfig.getConnectTimeout(), TimeUnit.SECONDS)
.readTimeout(difyConfig.getReadTimeout(), TimeUnit.SECONDS)
.build();
}
/**
* 从多个知识库检索
*/
public List<RetrievalRecord> retrieveFromMultipleDatasets(
String query,
List<String> datasetIds,
int topK) {
// 并行检索所有知识库
List<CompletableFuture<List<RetrievalRecord>>> futures =
datasetIds.stream()
.map(id -> CompletableFuture.supplyAsync(() ->
retrieveFromDataset(id, query, topK)
))
.collect(Collectors.toList());
// 等待完成
CompletableFuture.allOf(futures.toArray(new CompletableFuture[0])).join();
// 合并并排序
return futures.stream()
.map(CompletableFuture::join)
.flatMap(List::stream)
.sorted((a, b) -> Double.compare(b.getScore(), a.getScore()))
.limit(topK)
.collect(Collectors.toList());
}
/**
* 从单个知识库检索
*/
private List<RetrievalRecord> retrieveFromDataset(
String datasetId,
String query,
int topK) {
String url = String.format(
"%s/datasets/%s/retrieve",
difyConfig.getApiBaseUrl(),
datasetId
);
JSONObject body = new JSONObject();
body.put("query", query);
body.put("top_k", topK);
body.put("score_threshold", 0.7);
Request request = new Request.Builder()
.url(url)
.header("Authorization", "Bearer " + difyConfig.getApiKey())
.header("Content-Type", "application/json")
.post(RequestBody.create(
body.toString(),
MediaType.parse("application/json")
))
.build();
try (Response response = httpClient.newCall(request).execute()) {
if (!response.isSuccessful()) {
throw new DifyException("知识库检索失败: " + response.message());
}
String responseBody = response.body().string();
return parseRetrievalResponse(datasetId, responseBody);
} catch (IOException e) {
throw new DifyException("知识库检索异常", e);
}
}
/**
* 流式对话(带上下文)
*/
public void streamChatWithContext(
String query,
String context,
String conversationId,
String userId,
StreamCallback callback) {
String url = difyConfig.getApiBaseUrl() + "/chat-messages";
// 构建完整Prompt
String fullPrompt = String.format(
"请基于以下知识库内容回答用户问题。" +
"如果知识库中没有相关信息,请明确告知用户。\n\n" +
"知识库内容:\n%s\n\n" +
"用户问题:%s",
context, query
);
JSONObject body = new JSONObject();
body.put("query", fullPrompt);
body.put("conversation_id", conversationId);
body.put("user", userId);
body.put("response_mode", "streaming");
body.put("inputs", new JSONObject());
Request request = new Request.Builder()
.url(url)
.header("Authorization", "Bearer " + difyConfig.getApiKey())
.header("Content-Type", "application/json")
.post(RequestBody.create(
body.toString(),
MediaType.parse("application/json")
))
.build();
// SSE流式处理
httpClient.newCall(request).enqueue(new Callback() {
@Override
public void onResponse(Call call, Response response) {
if (!response.isSuccessful()) {
callback.onError(new DifyException("对话失败: " + response.message()));
return;
}
try (BufferedReader reader = new BufferedReader(
new InputStreamReader(response.body().byteStream()))) {
String line;
while ((line = reader.readLine()) != null) {
if (line.startsWith("data: ")) {
String data = line.substring(6);
if (!"[DONE]".equals(data)) {
JSONObject json = new JSONObject(data);
String chunk = json.optString("answer", "");
if (!chunk.isEmpty()) {
callback.onChunk(chunk);
}
}
}
}
callback.onComplete();
} catch (Exception e) {
callback.onError(e);
}
}
@Override
public void onFailure(Call call, IOException e) {
callback.onError(e);
}
});
}
}
```
---
## 📊 三种方式对比
| 方案 | 灵活性 | 实现难度 | 性能 | 推荐度 |
|------|--------|----------|------|--------|
| 检索API + 自定义LLM | ⭐⭐⭐⭐⭐ | ⭐⭐⭐ | ⭐⭐⭐⭐ | ⭐⭐⭐⭐⭐ |
| Workflow工作流 | ⭐⭐⭐⭐ | ⭐⭐ | ⭐⭐⭐⭐⭐ | ⭐⭐⭐⭐ |
| 多应用切换 | ⭐⭐ | ⭐ | ⭐⭐⭐ | ⭐ |
---
## 🎯 最终推荐方案
**使用"检索API + 自定义LLM"方案**
**理由**
1. ✅ 完全控制知识库访问权限
2. ✅ 可以实现复杂的部门隔离逻辑
3. ✅ 支持并行检索多个知识库
4. ✅ 可以自定义Prompt和上下文
5. ✅ 灵活性最高,适合企业级应用
**实现步骤**
1. 用户发起对话
2. 根据用户权限查询可访问的知识库Mapper已实现✅
3. 并行调用Dify检索API获取相关内容
4. 合并结果构建上下文
5. 调用LLM流式生成答案
6. 保存对话记录(含知识来源)
这样既利用了Dify的知识库能力又保持了完全的控制权🎉