diff --git a/.gitignore b/.gitignore
index 195f9bd6..3fed9d2b 100644
--- a/.gitignore
+++ b/.gitignore
@@ -6,4 +6,5 @@
**/*.difypkg
urbanLifeServ/*
-*/.data
\ No newline at end of file
+*/.data
+docs
\ No newline at end of file
diff --git a/ai-management-dify b/ai-management-dify
index 9fffb6e4..0de13a34 160000
--- a/ai-management-dify
+++ b/ai-management-dify
@@ -1 +1 @@
-Subproject commit 9fffb6e421cdc0d84a3a730c04a799456ad66776
+Subproject commit 0de13a34959c0179ac9a675f51ee2c8cd7e38450
diff --git a/ai-management-platform b/ai-management-platform
index 6bbe3e41..085ef040 160000
--- a/ai-management-platform
+++ b/ai-management-platform
@@ -1 +1 @@
-Subproject commit 6bbe3e4181466bc86712e2d0abdba36ed8988082
+Subproject commit 085ef040aedda9863b05f67b5fef0b242efa2ce3
diff --git a/difyPlugin/DifyCLI.md b/difyPlugin/DifyCLI.md
deleted file mode 100644
index 8f8a4387..00000000
--- a/difyPlugin/DifyCLI.md
+++ /dev/null
@@ -1,146 +0,0 @@
-> ## Documentation Index
-> Fetch the complete documentation index at: https://docs.dify.ai/llms.txt
-> Use this file to discover all available pages before exploring further.
-
-# CLI
-
-> Dify 插件开发命令行界面
-
- ⚠️ 本文档由 AI 自动翻译。如有任何不准确之处,请参考[英文原版](/en/develop-plugin/getting-started/cli)。
-
-使用命令行界面(CLI)设置和打包你的 Dify 插件。CLI 提供了一种简化的方式来管理你的插件开发工作流,从初始化到打包。
-
-本指南将指导你如何使用 CLI 进行 Dify 插件开发。
-
-## 前提条件
-
-在开始之前,请确保已安装以下内容:
-
-* Python 版本 ≥ 3.12
-* Dify CLI
-* Homebrew(适用于 Mac 用户)
-
-## 创建 Dify 插件项目
-
-
-
- ```bash theme={null}
- brew tap langgenius/dify
- brew install dify
- ```
-
-
-
- 从 [Dify GitHub 发布页面](https://github.com/langgenius/dify-plugin-daemon/releases) 获取最新的 Dify CLI
-
- ```bash theme={null}
- # Download dify-plugin-darwin-arm64
- chmod +x dify-plugin-darwin-arm64
- mv dify-plugin-darwin-arm64 dify
- sudo mv dify /usr/local/bin/
- ```
-
-
-
-现在你已成功安装 Dify CLI。你可以通过运行以下命令来验证安装:
-
-```bash theme={null}
-dify version
-```
-
-你可以使用以下命令创建一个新的 Dify 插件项目:
-
-```bash theme={null}
-dify plugin init
-```
-
-根据提示填写必填字段:
-
-```bash theme={null}
-Edit profile of the plugin
-Plugin name (press Enter to next step): hello-world
-Author (press Enter to next step): langgenius
-Description (press Enter to next step): hello world example
-Repository URL (Optional) (press Enter to next step): Repository URL (Optional)
- Enable multilingual README: [✔] English is required by default
-
-Languages to generate:
- English: [✔] (required)
- → 简体中文 (Simplified Chinese): [✔]
- 日本語 (Japanese): [✘]
- Português (Portuguese - Brazil): [✘]
-
-Controls:
- ↑/↓ Navigate • Space/Tab Toggle selection • Enter Next step
-```
-
-选择 `python` 并按 Enter 继续使用 Python 插件模板。
-
-```bash theme={null}
-Select the type of plugin you want to create, and press `Enter` to continue
-Before starting, here's some basic knowledge about Plugin types in Dify:
-
-- Tool: Tool Providers like Google Search, Stable Diffusion, etc. Used to perform specific tasks.
-- Model: Model Providers like OpenAI, Anthropic, etc. Use their models to enhance AI capabilities.
-- Endpoint: Similar to Service API in Dify and Ingress in Kubernetes. Extend HTTP services as endpoints with custom logi
-- Agent Strategy: Implement your own agent strategies like Function Calling, ReAct, ToT, CoT, etc.
-
-Based on the ability you want to extend, Plugins are divided into four types: Tool, Model, Extension, and Agent Strategy
-
-- Tool: A tool provider that can also implement endpoints. For example, building a Discord Bot requires both Sending and
-- Model: Strictly for model providers, no other extensions allowed.
-- Extension: For simple HTTP services that extend functionality.
-- Agent Strategy: Implement custom agent logic with a focused approach.
-
-We've provided templates to help you get started. Choose one of the options below:
--> tool
- agent-strategy
- llm
- text-embedding
- rerank
- tts
- speech2text
- moderation
- extension
-```
-
-输入默认的 dify 版本,留空则使用最新版本:
-
-```bash theme={null}
-Edit minimal Dify version requirement, leave it blank by default
-Minimal Dify version (press Enter to next step):
-```
-
-现在你已准备就绪!CLI 将创建一个以你提供的插件名称命名的新目录,并为你的插件设置基本结构。
-
-```bash theme={null}
-cd hello-world
-```
-
-## 运行插件
-
-确保你在 hello-world 目录中
-
-```bash theme={null}
-cp .env.example .env
-```
-
-编辑 `.env` 文件以设置插件的环境变量,例如 API 密钥或其他配置。你可以在 Dify 仪表板中找到这些变量。登录到你的 Dify 环境,点击右上角的"插件"图标,然后点击调试图标(或类似虫子的图标)。在弹出窗口中,复制"API Key"和"Host Address"。(请参考你本地对应的截图,其中显示了获取密钥和主机地址的界面)
-
-```bash theme={null}
-INSTALL_METHOD=remote
-REMOTE_INSTALL_HOST=debug-plugin.dify.dev
-REMOTE_INSTALL_PORT=5003
-REMOTE_INSTALL_KEY=********-****-****-****-************
-```
-
-现在你可以使用以下命令在本地运行你的插件:
-
-```bash theme={null}
-pip install -r requirements.txt
-python -m main
-```
-
-***
-
-[编辑此页面](https://github.com/langgenius/dify-docs/edit/main/en/develop-plugin/getting-started/cli.mdx) | [报告问题](https://github.com/langgenius/dify-docs/issues/new?template=docs.yml)
diff --git a/difyPlugin/pdf/.difyignore b/difyPlugin/pdf/.difyignore
deleted file mode 100644
index 4685c5eb..00000000
--- a/difyPlugin/pdf/.difyignore
+++ /dev/null
@@ -1,184 +0,0 @@
-# Byte-compiled / optimized / DLL files
-__pycache__/
-*.py[cod]
-*$py.class
-
-# Distribution / packaging
-.Python
-build/
-develop-eggs/
-dist/
-downloads/
-eggs/
-.eggs/
-lib/
-lib64/
-parts/
-sdist/
-var/
-wheels/
-share/python-wheels/
-*.egg-info/
-.installed.cfg
-*.egg
-MANIFEST
-
-# PyInstaller
-# Usually these files are written by a python script from a template
-# before PyInstaller builds the exe, so as to inject date/other infos into it.
-*.manifest
-*.spec
-
-# Installer logs
-pip-log.txt
-pip-delete-this-directory.txt
-
-# Unit test / coverage reports
-htmlcov/
-.tox/
-.nox/
-.coverage
-.coverage.*
-.cache
-nosetests.xml
-coverage.xml
-*.cover
-*.py,cover
-.hypothesis/
-.pytest_cache/
-cover/
-
-# Translations
-*.mo
-*.pot
-
-# Django stuff:
-*.log
-local_settings.py
-db.sqlite3
-db.sqlite3-journal
-
-# Flask stuff:
-instance/
-.webassets-cache
-
-# Scrapy stuff:
-.scrapy
-
-# Sphinx documentation
-docs/_build/
-
-# PyBuilder
-.pybuilder/
-target/
-
-# Jupyter Notebook
-.ipynb_checkpoints
-
-# IPython
-profile_default/
-ipython_config.py
-
-# pyenv
-# For a library or package, you might want to ignore these files since the code is
-# intended to run in multiple environments; otherwise, check them in:
-.python-version
-
-# pipenv
-# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
-# However, in case of collaboration, if having platform-specific dependencies or dependencies
-# having no cross-platform support, pipenv may install dependencies that don't work, or not
-# install all needed dependencies.
-Pipfile.lock
-
-# UV
-# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
-# This is especially recommended for binary packages to ensure reproducibility, and is more
-# commonly ignored for libraries.
-uv.lock
-
-# poetry
-# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
-# This is especially recommended for binary packages to ensure reproducibility, and is more
-# commonly ignored for libraries.
-# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
-poetry.lock
-
-# pdm
-# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
-#pdm.lock
-# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
-# in version control.
-# https://pdm.fming.dev/latest/usage/project/#working-with-version-control
-.pdm.toml
-.pdm-python
-.pdm-build/
-
-# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
-__pypackages__/
-
-# Celery stuff
-celerybeat-schedule
-celerybeat.pid
-
-# SageMath parsed files
-*.sage.py
-
-# Environments
-.env
-.venv
-env/
-venv/
-ENV/
-env.bak/
-venv.bak/
-
-# Spyder project settings
-.spyderproject
-.spyproject
-
-# Rope project settings
-.ropeproject
-
-# mkdocs documentation
-/site
-
-# mypy
-.mypy_cache/
-.dmypy.json
-dmypy.json
-
-# Pyre type checker
-.pyre/
-
-# pytype static type analyzer
-.pytype/
-
-# Cython debug symbols
-cython_debug/
-
-# PyCharm
-# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
-# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
-# and can be added to the global gitignore or merged into this file. For a more nuclear
-# option (not recommended) you can uncomment the following to ignore the entire idea folder.
-.idea/
-
-# Vscode
-.vscode/
-
-# Git
-.git/
-.gitignore
-.github/
-
-# Mac
-.DS_Store
-
-# Windows
-Thumbs.db
-
-# Dify plugin packages
-# To prevent packaging repetitively
-*.difypkg
-
diff --git a/difyPlugin/pdf/.env.example b/difyPlugin/pdf/.env.example
deleted file mode 100644
index 60358af8..00000000
--- a/difyPlugin/pdf/.env.example
+++ /dev/null
@@ -1,3 +0,0 @@
-INSTALL_METHOD=remote
-REMOTE_INSTALL_URL=debug.dify.ai:5003
-REMOTE_INSTALL_KEY=********-****-****-****-************
diff --git a/difyPlugin/pdf/.github/workflows/plugin-publish.yml b/difyPlugin/pdf/.github/workflows/plugin-publish.yml
deleted file mode 100644
index d24c4dd5..00000000
--- a/difyPlugin/pdf/.github/workflows/plugin-publish.yml
+++ /dev/null
@@ -1,109 +0,0 @@
-name: Plugin Publish Workflow
-
-on:
- release:
- types: [published]
-
-jobs:
- publish:
- runs-on: ubuntu-latest
- steps:
- - name: Checkout code
- uses: actions/checkout@v3
-
- - name: Download CLI tool
- run: |
- mkdir -p $RUNNER_TEMP/bin
- cd $RUNNER_TEMP/bin
-
- wget https://github.com/langgenius/dify-plugin-daemon/releases/download/0.0.6/dify-plugin-linux-amd64
- chmod +x dify-plugin-linux-amd64
-
- echo "CLI tool location:"
- pwd
- ls -la dify-plugin-linux-amd64
-
- - name: Get basic info from manifest
- id: get_basic_info
- run: |
- PLUGIN_NAME=$(grep "^name:" manifest.yaml | cut -d' ' -f2)
- echo "Plugin name: $PLUGIN_NAME"
- echo "plugin_name=$PLUGIN_NAME" >> $GITHUB_OUTPUT
-
- VERSION=$(grep "^version:" manifest.yaml | cut -d' ' -f2)
- echo "Plugin version: $VERSION"
- echo "version=$VERSION" >> $GITHUB_OUTPUT
-
- # If the author's name is not your github username, you can change the author here
- AUTHOR=$(grep "^author:" manifest.yaml | cut -d' ' -f2)
- echo "Plugin author: $AUTHOR"
- echo "author=$AUTHOR" >> $GITHUB_OUTPUT
-
- - name: Package Plugin
- id: package
- run: |
- cd $GITHUB_WORKSPACE
- PACKAGE_NAME="${{ steps.get_basic_info.outputs.plugin_name }}-${{ steps.get_basic_info.outputs.version }}.difypkg"
- $RUNNER_TEMP/bin/dify-plugin-linux-amd64 plugin package . -o "$PACKAGE_NAME"
-
- echo "Package result:"
- ls -la "$PACKAGE_NAME"
- echo "package_name=$PACKAGE_NAME" >> $GITHUB_OUTPUT
-
- echo "\nFull file path:"
- pwd
- echo "\nDirectory structure:"
- tree || ls -R
-
- - name: Checkout target repo
- uses: actions/checkout@v3
- with:
- repository: ${{steps.get_basic_info.outputs.author}}/dify-plugins
- path: dify-plugins
- token: ${{ secrets.PLUGIN_ACTION }}
- fetch-depth: 1
- persist-credentials: true
-
- - name: Prepare and create PR
- run: |
- PACKAGE_NAME="${{ steps.get_basic_info.outputs.plugin_name }}-${{ steps.get_basic_info.outputs.version }}.difypkg"
- mkdir -p dify-plugins/${{ steps.get_basic_info.outputs.author }}/${{ steps.get_basic_info.outputs.plugin_name }}
- mv "$PACKAGE_NAME" dify-plugins/${{ steps.get_basic_info.outputs.author }}/${{ steps.get_basic_info.outputs.plugin_name }}/
-
- cd dify-plugins
-
- git config user.name "GitHub Actions"
- git config user.email "actions@github.com"
-
- git fetch origin main
- git checkout main
- git pull origin main
-
- BRANCH_NAME="bump-${{ steps.get_basic_info.outputs.plugin_name }}-plugin-${{ steps.get_basic_info.outputs.version }}"
- git checkout -b "$BRANCH_NAME"
-
- git add .
- git commit -m "bump ${{ steps.get_basic_info.outputs.plugin_name }} plugin to version ${{ steps.get_basic_info.outputs.version }}"
-
- git push -u origin "$BRANCH_NAME" --force
-
- git branch -a
- echo "Waiting for branch to sync..."
- sleep 10 # Wait 10 seconds for branch sync
-
- - name: Create PR via GitHub API
- env:
- # How to config the token:
- # 1. Profile -> Settings -> Developer settings -> Personal access tokens -> Generate new token (with repo scope) -> Copy the token
- # 2. Go to the target repository -> Settings -> Secrets and variables -> Actions -> New repository secret -> Add the token as PLUGIN_ACTION
- GH_TOKEN: ${{ secrets.PLUGIN_ACTION }}
- run: |
- gh pr create \
- --repo langgenius/dify-plugins \
- --head "${{ steps.get_basic_info.outputs.author }}:${{ steps.get_basic_info.outputs.plugin_name }}-${{ steps.get_basic_info.outputs.version }}" \
- --base main \
- --title "bump ${{ steps.get_basic_info.outputs.plugin_name }} plugin to version ${{ steps.get_basic_info.outputs.version }}" \
- --body "bump ${{ steps.get_basic_info.outputs.plugin_name }} plugin package to version ${{ steps.get_basic_info.outputs.version }}
-
- Changes:
- - Updated plugin package file" || echo "PR already exists or creation skipped." # Handle cases where PR already exists
diff --git a/difyPlugin/pdf/.gitignore b/difyPlugin/pdf/.gitignore
deleted file mode 100644
index a16dc979..00000000
--- a/difyPlugin/pdf/.gitignore
+++ /dev/null
@@ -1,176 +0,0 @@
-# Byte-compiled / optimized / DLL files
-__pycache__/
-*.py[cod]
-*$py.class
-
-# C extensions
-*.so
-
-# Distribution / packaging
-.Python
-build/
-develop-eggs/
-dist/
-downloads/
-eggs/
-.eggs/
-lib/
-lib64/
-parts/
-sdist/
-var/
-wheels/
-share/python-wheels/
-*.egg-info/
-.installed.cfg
-*.egg
-MANIFEST
-
-# PyInstaller
-# Usually these files are written by a python script from a template
-# before PyInstaller builds the exe, so as to inject date/other infos into it.
-*.manifest
-*.spec
-
-# Installer logs
-pip-log.txt
-pip-delete-this-directory.txt
-
-# Unit test / coverage reports
-htmlcov/
-.tox/
-.nox/
-.coverage
-.coverage.*
-.cache
-nosetests.xml
-coverage.xml
-*.cover
-*.py,cover
-.hypothesis/
-.pytest_cache/
-cover/
-
-# Translations
-*.mo
-*.pot
-
-# Django stuff:
-*.log
-local_settings.py
-db.sqlite3
-db.sqlite3-journal
-
-# Flask stuff:
-instance/
-.webassets-cache
-
-# Scrapy stuff:
-.scrapy
-
-# Sphinx documentation
-docs/_build/
-
-# PyBuilder
-.pybuilder/
-target/
-
-# Jupyter Notebook
-.ipynb_checkpoints
-
-# IPython
-profile_default/
-ipython_config.py
-
-# pyenv
-# For a library or package, you might want to ignore these files since the code is
-# intended to run in multiple environments; otherwise, check them in:
-# .python-version
-
-# pipenv
-# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
-# However, in case of collaboration, if having platform-specific dependencies or dependencies
-# having no cross-platform support, pipenv may install dependencies that don't work, or not
-# install all needed dependencies.
-#Pipfile.lock
-
-# UV
-# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
-# This is especially recommended for binary packages to ensure reproducibility, and is more
-# commonly ignored for libraries.
-#uv.lock
-
-# poetry
-# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
-# This is especially recommended for binary packages to ensure reproducibility, and is more
-# commonly ignored for libraries.
-# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
-#poetry.lock
-
-# pdm
-# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
-#pdm.lock
-# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
-# in version control.
-# https://pdm.fming.dev/latest/usage/project/#working-with-version-control
-.pdm.toml
-.pdm-python
-.pdm-build/
-
-# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
-__pypackages__/
-
-# Celery stuff
-celerybeat-schedule
-celerybeat.pid
-
-# SageMath parsed files
-*.sage.py
-
-# Environments
-.env
-.venv
-env/
-venv/
-ENV/
-env.bak/
-venv.bak/
-
-# Spyder project settings
-.spyderproject
-.spyproject
-
-# Rope project settings
-.ropeproject
-
-# mkdocs documentation
-/site
-
-# mypy
-.mypy_cache/
-.dmypy.json
-dmypy.json
-
-# Pyre type checker
-.pyre/
-
-# pytype static type analyzer
-.pytype/
-
-# Cython debug symbols
-cython_debug/
-
-# PyCharm
-# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
-# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
-# and can be added to the global gitignore or merged into this file. For a more nuclear
-# option (not recommended) you can uncomment the following to ignore the entire idea folder.
-.idea/
-
-# Vscode
-.vscode/
-
-# macOS
-.DS_Store
-.AppleDouble
-.LSOverride
\ No newline at end of file
diff --git a/difyPlugin/pdf/GUIDE.md b/difyPlugin/pdf/GUIDE.md
deleted file mode 100644
index 27d33f9d..00000000
--- a/difyPlugin/pdf/GUIDE.md
+++ /dev/null
@@ -1,137 +0,0 @@
-# Dify Plugin Development Guide
-
-Welcome to Dify plugin development! This guide will help you get started quickly.
-
-## Plugin Types
-
-Dify plugins extend three main capabilities:
-
-| Type | Description | Example |
-|------|-------------|---------|
-| **Tool** | Perform specific tasks | Google Search, Stable Diffusion |
-| **Model** | AI model integrations | OpenAI, Anthropic |
-| **Endpoint** | HTTP services | Custom APIs, integrations |
-
-You can create:
-- **Tool**: Tool provider with optional endpoints (e.g., Discord bot)
-- **Model**: Model provider only
-- **Extension**: Simple HTTP service
-
-## Setup
-
-### Requirements
-- Python 3.11+
-- Dependencies: `pip install -r requirements.txt`
-
-## Development Process
-
-
-1. Manifest Structure
-
-Edit `manifest.yaml` to describe your plugin:
-
-```yaml
-version: 0.1.0 # Required: Plugin version
-type: plugin # Required: plugin or bundle
-author: YourOrganization # Required: Organization name
-label: # Required: Multi-language names
- en_US: Plugin Name
- zh_Hans: 插件名称
-created_at: 2023-01-01T00:00:00Z # Required: Creation time (RFC3339)
-icon: assets/icon.png # Required: Icon path
-
-# Resources and permissions
-resource:
- memory: 268435456 # Max memory (bytes)
- permission:
- tool:
- enabled: true # Tool permission
- model:
- enabled: true # Model permission
- llm: true
- text_embedding: false
- # Other model types...
- # Other permissions...
-
-# Extensions definition
-plugins:
- tools:
- - tools/my_tool.yaml # Tool definition files
- models:
- - models/my_model.yaml # Model definition files
- endpoints:
- - endpoints/my_api.yaml # Endpoint definition files
-
-# Runtime metadata
-meta:
- version: 0.0.1 # Manifest format version
- arch:
- - amd64
- - arm64
- runner:
- language: python
- version: "3.12"
- entrypoint: main
-```
-
-**Restrictions:**
-- Cannot extend both tools and models
-- Must have at least one extension
-- Cannot extend both models and endpoints
-- Limited to one supplier per extension type
-
-
-
-2. Implementation Examples
-
-Study these examples to understand plugin implementation:
-
-- [OpenAI](https://github.com/langgenius/dify-plugin-sdks/tree/main/python/examples/openai) - Model provider
-- [Google Search](https://github.com/langgenius/dify-plugin-sdks/tree/main/python/examples/google) - Tool provider
-- [Neko](https://github.com/langgenius/dify-plugin-sdks/tree/main/python/examples/neko) - Endpoint group
-
-
-
-3. Testing & Debugging
-
-1. Copy `.env.example` to `.env` and configure:
- ```
- INSTALL_METHOD=remote
- REMOTE_INSTALL_URL=debug.dify.ai:5003
- REMOTE_INSTALL_KEY=your-debug-key
- ```
-
-2. Run your plugin:
- ```bash
- python -m main
- ```
-
-3. Refresh your Dify instance to see the plugin (marked as "debugging")
-
-
-
-4. Publishing
-
-#### Manual Packaging
-```bash
-dify-plugin plugin package ./YOUR_PLUGIN_DIR
-```
-
-#### Automated GitHub Workflow
-
-Configure GitHub Actions to automate PR creation:
-
-1. Create a Personal Access Token for your forked repository
-2. Add it as `PLUGIN_ACTION` secret in your source repo
-3. Create `.github/workflows/plugin-publish.yml`
-
-When you create a release, the action will:
-- Package your plugin
-- Create a PR to your fork
-
-[Detailed workflow documentation](https://docs.dify.ai/plugins/publish-plugins/plugin-auto-publish-pr)
-
-
-## Privacy Policy
-
-If publishing to the Marketplace, provide a privacy policy in [PRIVACY.md](PRIVACY.md).
\ No newline at end of file
diff --git a/difyPlugin/pdf/PRIVACY.md b/difyPlugin/pdf/PRIVACY.md
deleted file mode 100644
index b088c73a..00000000
--- a/difyPlugin/pdf/PRIVACY.md
+++ /dev/null
@@ -1,3 +0,0 @@
-## Privacy
-
-!!! Please fill in the privacy policy of the plugin.
\ No newline at end of file
diff --git a/difyPlugin/pdf/README.md b/difyPlugin/pdf/README.md
deleted file mode 100644
index 0b767ecf..00000000
--- a/difyPlugin/pdf/README.md
+++ /dev/null
@@ -1,10 +0,0 @@
-## pdf
-
-**Author:** yslg
-**Version:** 0.0.1
-**Type:** tool
-
-### Description
-
-
-
diff --git a/difyPlugin/pdf/_assets/icon-dark.svg b/difyPlugin/pdf/_assets/icon-dark.svg
deleted file mode 100644
index 75a6cc1b..00000000
--- a/difyPlugin/pdf/_assets/icon-dark.svg
+++ /dev/null
@@ -1,55 +0,0 @@
-
-
diff --git a/difyPlugin/pdf/_assets/icon.svg b/difyPlugin/pdf/_assets/icon.svg
deleted file mode 100644
index 1decb4e0..00000000
--- a/difyPlugin/pdf/_assets/icon.svg
+++ /dev/null
@@ -1,55 +0,0 @@
-
-
diff --git a/difyPlugin/pdf/main.py b/difyPlugin/pdf/main.py
deleted file mode 100644
index 7e1a983d..00000000
--- a/difyPlugin/pdf/main.py
+++ /dev/null
@@ -1,6 +0,0 @@
-from dify_plugin import Plugin, DifyPluginEnv
-
-plugin = Plugin(DifyPluginEnv(MAX_REQUEST_TIMEOUT=120))
-
-if __name__ == '__main__':
- plugin.run()
diff --git a/difyPlugin/pdf/manifest.yaml b/difyPlugin/pdf/manifest.yaml
deleted file mode 100644
index 27f075f3..00000000
--- a/difyPlugin/pdf/manifest.yaml
+++ /dev/null
@@ -1,40 +0,0 @@
-version: 0.0.1
-type: plugin
-author: yslg
-name: pdf
-label:
- en_US: pdf
- ja_JP: pdf
- zh_Hans: pdf
- pt_BR: pdf
-description:
- en_US: pdfTools
- ja_JP: pdfTools
- zh_Hans: pdfTools
- pt_BR: pdfTools
-icon: icon.svg
-icon_dark: icon-dark.svg
-resource:
- memory: 268435456
- permission:
- tool:
- enabled: true
- model:
- enabled: true
- llm: true
-plugins:
- tools:
- - provider/pdf.yaml
-meta:
- version: 0.0.1
- arch:
- - amd64
- - arm64
- runner:
- language: python
- version: "3.12"
- entrypoint: main
- minimum_dify_version: null
-created_at: 2026-03-02T13:21:03.2806864+08:00
-privacy: PRIVACY.md
-verified: false
diff --git a/difyPlugin/pdf/plugin.json b/difyPlugin/pdf/plugin.json
deleted file mode 100644
index a3513216..00000000
--- a/difyPlugin/pdf/plugin.json
+++ /dev/null
@@ -1,64 +0,0 @@
-{
- "name": "pdf-plugin",
- "version": "1.0.0",
- "description": "PDF plugin for analyzing table of contents and extracting text",
- "author": "System",
- "type": "tool",
- "main": "main.py",
- "requirements": "requirements.txt",
- "icon": "https://neeko-copilot.bytedance.net/api/text2image?prompt=PDF%20document%20icon&size=square",
- "settings": [
- {
- "key": "debug",
- "type": "boolean",
- "default": false,
- "description": "Enable debug mode"
- }
- ],
- "functions": [
- {
- "name": "analyze_toc",
- "description": "Analyze PDF and find table of contents",
- "parameters": {
- "type": "object",
- "properties": {
- "file": {
- "type": "file",
- "description": "PDF file to analyze",
- "fileTypes": ["pdf"]
- }
- },
- "required": ["file"]
- }
- },
- {
- "name": "extract_text",
- "description": "Extract text from specified page range",
- "parameters": {
- "type": "object",
- "properties": {
- "file": {
- "type": "file",
- "description": "PDF file to extract text from",
- "fileTypes": ["pdf"]
- },
- "page_range": {
- "type": "object",
- "properties": {
- "start": {
- "type": "integer",
- "default": 0,
- "description": "Start page index"
- },
- "end": {
- "type": "integer",
- "description": "End page index"
- }
- }
- }
- },
- "required": ["file"]
- }
- }
- ]
-}
\ No newline at end of file
diff --git a/difyPlugin/pdf/provider/pdf.py b/difyPlugin/pdf/provider/pdf.py
deleted file mode 100644
index 50e7069e..00000000
--- a/difyPlugin/pdf/provider/pdf.py
+++ /dev/null
@@ -1,53 +0,0 @@
-from typing import Any
-
-from dify_plugin import ToolProvider
-from dify_plugin.errors.tool import ToolProviderCredentialValidationError
-
-
-class PdfProvider(ToolProvider):
-
- def _validate_credentials(self, credentials: dict[str, Any]) -> None:
- try:
- """
- IMPLEMENT YOUR VALIDATION HERE
- """
- except Exception as e:
- raise ToolProviderCredentialValidationError(str(e))
-
- #########################################################################################
- # If OAuth is supported, uncomment the following functions.
- # Warning: please make sure that the sdk version is 0.4.2 or higher.
- #########################################################################################
- # def _oauth_get_authorization_url(self, redirect_uri: str, system_credentials: Mapping[str, Any]) -> str:
- # """
- # Generate the authorization URL for pdf OAuth.
- # """
- # try:
- # """
- # IMPLEMENT YOUR AUTHORIZATION URL GENERATION HERE
- # """
- # except Exception as e:
- # raise ToolProviderOAuthError(str(e))
- # return ""
-
- # def _oauth_get_credentials(
- # self, redirect_uri: str, system_credentials: Mapping[str, Any], request: Request
- # ) -> Mapping[str, Any]:
- # """
- # Exchange code for access_token.
- # """
- # try:
- # """
- # IMPLEMENT YOUR CREDENTIALS EXCHANGE HERE
- # """
- # except Exception as e:
- # raise ToolProviderOAuthError(str(e))
- # return dict()
-
- # def _oauth_refresh_credentials(
- # self, redirect_uri: str, system_credentials: Mapping[str, Any], credentials: Mapping[str, Any]
- # ) -> OAuthCredentials:
- # """
- # Refresh the credentials
- # """
- # return OAuthCredentials(credentials=credentials, expires_at=-1)
diff --git a/difyPlugin/pdf/provider/pdf.yaml b/difyPlugin/pdf/provider/pdf.yaml
deleted file mode 100644
index 754fd59d..00000000
--- a/difyPlugin/pdf/provider/pdf.yaml
+++ /dev/null
@@ -1,21 +0,0 @@
-identity:
- author: "yslg"
- name: "pdf"
- label:
- en_US: "pdf"
- zh_Hans: "pdf"
- pt_BR: "pdf"
- ja_JP: "pdf"
- description:
- en_US: "pdfTools"
- zh_Hans: "pdfTools"
- pt_BR: "pdfTools"
- ja_JP: "pdfTools"
- icon: "icon.svg"
-
-tools:
- - tools/pdf_toc.yaml
- - tools/pdf_to_markdown.yaml
-extra:
- python:
- source: provider/pdf.py
diff --git a/difyPlugin/pdf/requirements.txt b/difyPlugin/pdf/requirements.txt
deleted file mode 100644
index 80735ec2..00000000
--- a/difyPlugin/pdf/requirements.txt
+++ /dev/null
@@ -1,2 +0,0 @@
-dify_plugin>=0.4.0,<0.7.0
-pymupdf>=1.27.1
\ No newline at end of file
diff --git a/difyPlugin/pdf/tools/pdf_to_markdown.py b/difyPlugin/pdf/tools/pdf_to_markdown.py
deleted file mode 100644
index 75367173..00000000
--- a/difyPlugin/pdf/tools/pdf_to_markdown.py
+++ /dev/null
@@ -1,234 +0,0 @@
-import json
-import re
-from collections.abc import Generator
-from typing import Any
-
-import fitz
-from dify_plugin import Tool
-from dify_plugin.entities.tool import ToolInvokeMessage
-
-
-class PdfToMarkdownTool(Tool):
- """Convert PDF to Markdown using an external catalog array."""
-
- def _invoke(self, tool_parameters: dict[str, Any]) -> Generator[ToolInvokeMessage]:
- file = tool_parameters.get("file")
- catalog_text = (tool_parameters.get("catalog") or "").strip()
- if not file:
- yield self.create_text_message("Error: file is required")
- return
- if not catalog_text:
- yield self.create_text_message("Error: catalog is required")
- return
-
- catalog = self._parse_catalog(catalog_text)
- if not catalog:
- yield self.create_text_message("Error: catalog must be a JSON array with title and page indexes")
- return
-
- doc = fitz.open(stream=file.blob, filetype="pdf")
- try:
- num_pages = len(doc)
- hf_texts = self._detect_headers_footers(doc, num_pages)
- page_mds = [self._page_to_markdown(doc[index], hf_texts) for index in range(num_pages)]
- final_md = self._assemble_by_catalog(catalog, page_mds, num_pages)
-
- yield self.create_text_message(final_md)
- yield self.create_blob_message(
- blob=final_md.encode("utf-8"),
- meta={"mime_type": "text/markdown"},
- )
- finally:
- doc.close()
-
- def _parse_catalog(self, catalog_text: str) -> list[dict[str, Any]]:
- try:
- raw = json.loads(catalog_text)
- except Exception:
- return []
-
- if not isinstance(raw, list):
- return []
-
- result: list[dict[str, Any]] = []
- for item in raw:
- if not isinstance(item, dict):
- continue
-
- title = str(item.get("title") or "").strip() or "Untitled"
- start_index = self._to_int(item.get("page_start_index"), None)
- end_index = self._to_int(item.get("page_end_index"), start_index)
-
- if start_index is None:
- start = self._to_int(item.get("start"), None)
- end = self._to_int(item.get("end"), start)
- if start is None:
- continue
- start_index = max(0, start - 1)
- end_index = max(start_index, (end if end is not None else start) - 1)
-
- if end_index is None:
- end_index = start_index
-
- result.append(
- {
- "title": title,
- "page_start_index": max(0, start_index),
- "page_end_index": max(start_index, end_index),
- }
- )
- return result
-
- def _detect_headers_footers(self, doc: fitz.Document, num_pages: int) -> set[str]:
- margin_ratio = 0.08
- sample_count = min(num_pages, 30)
- text_counts: dict[str, int] = {}
-
- for idx in range(sample_count):
- page = doc[idx]
- page_height = page.rect.height
- top_limit = page_height * margin_ratio
- bottom_limit = page_height * (1 - margin_ratio)
- try:
- blocks = page.get_text("blocks", sort=True) or []
- except Exception:
- continue
-
- seen: set[str] = set()
- for block in blocks:
- if len(block) < 7 or block[6] != 0:
- continue
- y0, y1 = block[1], block[3]
- text = (block[4] or "").strip()
- if not text or len(text) < 2 or text in seen:
- continue
- if y1 <= top_limit or y0 >= bottom_limit:
- seen.add(text)
- text_counts[text] = text_counts.get(text, 0) + 1
-
- threshold = max(3, sample_count * 0.35)
- return {text for text, count in text_counts.items() if count >= threshold}
-
- def _page_to_markdown(self, page: fitz.Page, hf_texts: set[str]) -> str:
- parts: list[str] = []
- page_height = page.rect.height
- top_margin = page_height * 0.06
- bottom_margin = page_height * 0.94
-
- table_rects: list[fitz.Rect] = []
- table_mds: list[str] = []
- try:
- find_tables = getattr(page, "find_tables", None)
- tables = []
- if callable(find_tables):
- table_finder = find_tables()
- tables = getattr(table_finder, "tables", []) or []
-
- for table in tables[:5]:
- try:
- table_rects.append(fitz.Rect(table.bbox))
- except Exception:
- pass
-
- cells = table.extract() or []
- if len(cells) < 2:
- continue
- if hf_texts and len(cells) <= 3:
- flat = " ".join(str(cell or "") for row in cells for cell in row)
- if any(hf in flat for hf in hf_texts):
- continue
-
- md_table = self._cells_to_md_table(cells)
- if md_table:
- table_mds.append(md_table)
- except Exception:
- pass
-
- try:
- blocks = page.get_text("blocks", sort=True) or []
- except Exception:
- blocks = []
-
- for block in blocks:
- if len(block) < 7 or block[6] != 0:
- continue
- x0, y0, x1, y1 = block[:4]
- text = (block[4] or "").strip()
- if not text:
- continue
-
- block_rect = fitz.Rect(x0, y0, x1, y1)
- if any(self._rects_overlap(block_rect, table_rect) for table_rect in table_rects):
- continue
- if hf_texts and (y1 <= top_margin or y0 >= bottom_margin):
- if any(hf in text for hf in hf_texts):
- continue
- if re.fullmatch(r"\s*\d{1,4}\s*", text):
- continue
-
- parts.append(text)
-
- parts.extend(table_mds)
- return "\n\n".join(parts)
-
- def _assemble_by_catalog(self, catalog: list[dict[str, Any]], page_mds: list[str], num_pages: int) -> str:
- parts: list[str] = []
- used_pages: set[int] = set()
-
- for item in catalog:
- start = max(0, min(int(item["page_start_index"]), num_pages - 1))
- end = max(start, min(int(item["page_end_index"]), num_pages - 1))
-
- chapter_parts = [f"# {item['title']}\n"]
- for idx in range(start, end + 1):
- if idx < len(page_mds) and page_mds[idx].strip() and idx not in used_pages:
- chapter_parts.append(page_mds[idx])
- used_pages.add(idx)
-
- if len(chapter_parts) > 1:
- parts.append("\n\n".join(chapter_parts))
-
- if parts:
- return "\n\n---\n\n".join(parts)
- return "\n\n---\n\n".join(m for m in page_mds if m.strip())
-
- @staticmethod
- def _rects_overlap(block_rect: fitz.Rect, table_rect: fitz.Rect) -> bool:
- inter = block_rect & table_rect
- if inter.is_empty:
- return False
- block_area = block_rect.width * block_rect.height
- if block_area <= 0:
- return False
- return (inter.width * inter.height) / block_area >= 0.3
-
- @staticmethod
- def _cells_to_md_table(cells: list) -> str:
- if not cells:
- return ""
-
- header = cells[0]
- ncols = len(header)
- if ncols == 0:
- return ""
-
- def clean(value: Any) -> str:
- return str(value or "").replace("|", "\\|").replace("\n", " ").strip()
-
- lines = [
- "| " + " | ".join(clean(cell) for cell in header) + " |",
- "| " + " | ".join("---" for _ in range(ncols)) + " |",
- ]
- for row in cells[1:]:
- padded = list(row) + [""] * max(0, ncols - len(row))
- lines.append("| " + " | ".join(clean(cell) for cell in padded[:ncols]) + " |")
- return "\n".join(lines)
-
- @staticmethod
- def _to_int(value: Any, default: int | None) -> int | None:
- try:
- if value is None or value == "":
- return default
- return int(value)
- except Exception:
- return default
diff --git a/difyPlugin/pdf/tools/pdf_to_markdown.yaml b/difyPlugin/pdf/tools/pdf_to_markdown.yaml
deleted file mode 100644
index 9a089a2b..00000000
--- a/difyPlugin/pdf/tools/pdf_to_markdown.yaml
+++ /dev/null
@@ -1,51 +0,0 @@
-identity:
- name: "pdf_to_markdown"
- author: "yslg"
- label:
- en_US: "PDF to Markdown"
- zh_Hans: "PDF to Markdown"
- pt_BR: "PDF para Markdown"
- ja_JP: "PDF to Markdown"
-description:
- human:
- en_US: "Convert PDF to Markdown using a catalog array. Images and graphics are ignored."
- zh_Hans: "Convert PDF to Markdown using a catalog array. Images and graphics are ignored."
- pt_BR: "Convert PDF to Markdown using a catalog array. Images and graphics are ignored."
- ja_JP: "Convert PDF to Markdown using a catalog array. Images and graphics are ignored."
- llm: "Convert a PDF file into Markdown using a catalog JSON array. Ignore images and graphics."
-parameters:
- - name: file
- type: file
- required: true
- label:
- en_US: PDF File
- zh_Hans: PDF File
- pt_BR: PDF File
- ja_JP: PDF File
- human_description:
- en_US: "PDF file to convert"
- zh_Hans: "PDF file to convert"
- pt_BR: "PDF file to convert"
- ja_JP: "PDF file to convert"
- llm_description: "PDF file to convert to Markdown"
- form: llm
- fileTypes:
- - "pdf"
- - name: catalog
- type: string
- required: true
- label:
- en_US: Catalog JSON
- zh_Hans: Catalog JSON
- pt_BR: Catalog JSON
- ja_JP: Catalog JSON
- human_description:
- en_US: "Catalog JSON array like [{title,start,end,page_start_index,page_end_index}]"
- zh_Hans: "Catalog JSON array like [{title,start,end,page_start_index,page_end_index}]"
- pt_BR: "Catalog JSON array like [{title,start,end,page_start_index,page_end_index}]"
- ja_JP: "Catalog JSON array like [{title,start,end,page_start_index,page_end_index}]"
- llm_description: "Catalog JSON array returned by pdf_toc"
- form: llm
-extra:
- python:
- source: tools/pdf_to_markdown.py
diff --git a/difyPlugin/pdf/tools/pdf_toc.py b/difyPlugin/pdf/tools/pdf_toc.py
deleted file mode 100644
index 12c1caf6..00000000
--- a/difyPlugin/pdf/tools/pdf_toc.py
+++ /dev/null
@@ -1,312 +0,0 @@
-import json
-import re
-from collections import OrderedDict
-from collections.abc import Generator
-from typing import Any
-
-import fitz
-from dify_plugin import Tool
-from dify_plugin.entities.model.llm import LLMModelConfig
-from dify_plugin.entities.model.message import SystemPromptMessage, UserPromptMessage
-from dify_plugin.entities.tool import ToolInvokeMessage
-
-_TOC_SYSTEM_PROMPT = """你是专业的PDF目录解析助手。请从以下PDF文本中提取文档的目录/章节结构。
-
-要求:
-1. 识别所有一级和二级标题及其对应的页码
-2. 只返回纯JSON数组,不要markdown代码块,不要任何解释
-3. 格式: [{"title": "章节标题", "page": 页码数字}]
-4. 页码必须是文档中标注的实际页码数字
-5. 如果无法识别目录,返回空数组 []"""
-
-
-class PdfTocTool(Tool):
- _TOC_PATTERNS = [
- r"目录",
- r"目\s*录",
- r"目\u3000录",
- r"Table of Contents",
- r"Contents",
- r"目次",
- ]
-
- def _invoke(self, tool_parameters: dict[str, Any]) -> Generator[ToolInvokeMessage]:
- file = tool_parameters.get("file")
- if not file:
- yield self.create_text_message("Error: file is required")
- return
-
- model_config = tool_parameters.get("model")
-
- doc = fitz.open(stream=file.blob, filetype="pdf")
- try:
- num_pages = len(doc)
-
- # 1) 优先从PDF元数据提取目录
- catalog = self._catalog_from_metadata(doc.get_toc(), num_pages)
-
- # 2) 元数据无目录时,使用LLM解析
- if not catalog and model_config:
- catalog = self._extract_toc_with_llm(doc, num_pages, model_config)
-
- # 3) 无LLM配置时回退到正则解析
- if not catalog:
- toc_start, toc_end = self._find_toc_pages(doc, num_pages)
- if toc_start is not None and toc_end is not None:
- toc_text = "\n".join(
- doc[index].get_text() or "" for index in range(toc_start, toc_end + 1)
- )
- printed_catalog = self._parse_toc_lines(toc_text)
- catalog = self._attach_page_indexes(printed_catalog, toc_end, num_pages)
-
- if not catalog:
- catalog = []
-
- yield self.create_text_message(json.dumps(catalog, ensure_ascii=False))
- finally:
- doc.close()
-
- def _extract_toc_with_llm(
- self, doc: fitz.Document, num_pages: int, model_config: dict[str, Any]
- ) -> list[dict[str, int | str]]:
- # 先尝试定位目录页
- toc_start, toc_end = self._find_toc_pages(doc, num_pages)
-
- if toc_start is not None and toc_end is not None:
- # 有目录页,提取目录页文本
- toc_text = "\n".join(
- doc[index].get_text() or "" for index in range(toc_start, toc_end + 1)
- )
- content_offset = toc_end
- else:
- # 无目录页,提取前15页文本让LLM识别章节结构
- sample = min(num_pages, 15)
- toc_text = "\n\n--- 第{}页 ---\n".join(
- [""] + [doc[i].get_text() or "" for i in range(sample)]
- )
- toc_text = toc_text.strip()
- if not toc_text:
- return []
- content_offset = 0
-
- # 截断过长文本
- if len(toc_text) > 15000:
- toc_text = toc_text[:15000] + "\n...[截断]"
-
- try:
- response = self.session.model.llm.invoke(
- model_config=LLMModelConfig(**model_config),
- prompt_messages=[
- SystemPromptMessage(content=_TOC_SYSTEM_PROMPT),
- UserPromptMessage(content=toc_text),
- ],
- stream=False,
- )
-
- llm_text = self._get_response_text(response)
- if not llm_text:
- return []
-
- raw_catalog = self._parse_llm_json(llm_text)
- if not raw_catalog:
- return []
-
- # 转换LLM返回的简单格式为完整catalog
- return self._build_catalog_from_llm(raw_catalog, content_offset, num_pages)
- except Exception:
- return []
-
- def _build_catalog_from_llm(
- self, raw: list[dict], content_offset: int, num_pages: int
- ) -> list[dict[str, int | str]]:
- entries: list[tuple[str, int]] = []
- for item in raw:
- title = str(item.get("title") or "").strip()
- page = self._to_int(item.get("page"), None)
- if not title or page is None:
- continue
- entries.append((title, page))
-
- if not entries:
- return []
-
- # 计算偏移量:第一个条目的页码与实际内容起始页的差值
- first_printed_page = entries[0][1]
- offset = (content_offset + 1) - first_printed_page if content_offset > 0 else 0
-
- result: list[dict[str, int | str]] = []
- for i, (title, page) in enumerate(entries):
- next_page = entries[i + 1][1] if i + 1 < len(entries) else page
- page_start_index = max(0, min(page + offset - 1, num_pages - 1))
- page_end_index = max(page_start_index, min(next_page + offset - 2, num_pages - 1))
- if i == len(entries) - 1:
- page_end_index = num_pages - 1
-
- result.append({
- "title": title,
- "start": page,
- "end": max(page, next_page - 1) if i + 1 < len(entries) else page,
- "page_start_index": page_start_index,
- "page_end_index": page_end_index,
- })
-
- return result
-
- @staticmethod
- def _get_response_text(response: Any) -> str:
- if not hasattr(response, "message") or not response.message:
- return ""
- content = response.message.content
- if isinstance(content, str):
- text = content
- elif isinstance(content, list):
- text = "".join(
- item.data if hasattr(item, "data") else str(item) for item in content
- )
- else:
- text = str(content)
-
- # 清理思考标签
- text = re.sub(r"[\s\S]*?", "", text, flags=re.IGNORECASE)
- text = re.sub(r"<\|[^>]+\|>", "", text)
- return text.strip()
-
- @staticmethod
- def _parse_llm_json(text: str) -> list[dict]:
- # 尝试提取JSON代码块
- code_match = re.search(r"```(?:json)?\s*([\s\S]*?)```", text)
- if code_match:
- text = code_match.group(1).strip()
-
- # 尝试找到JSON数组
- bracket_match = re.search(r"\[[\s\S]*\]", text)
- if bracket_match:
- text = bracket_match.group(0)
-
- try:
- result = json.loads(text)
- if isinstance(result, list):
- return result
- except Exception:
- pass
- return []
-
- def _catalog_from_metadata(self, toc: list, num_pages: int) -> list[dict[str, int | str]]:
- top = [(title, max(0, page - 1)) for level, title, page in toc if level <= 2 and page >= 1]
- if not top:
- return []
-
- result: list[dict[str, int | str]] = []
- for index, (title, start_index) in enumerate(top):
- end_index = top[index + 1][1] - 1 if index + 1 < len(top) else num_pages - 1
- result.append({
- "title": title,
- "start": start_index + 1,
- "end": max(start_index, end_index) + 1,
- "page_start_index": start_index,
- "page_end_index": max(start_index, end_index),
- })
- return result
-
- def _find_toc_pages(self, doc: fitz.Document, num_pages: int) -> tuple[int | None, int | None]:
- toc_start = None
- toc_end = None
- for page_number in range(min(num_pages, 30)):
- text = doc[page_number].get_text() or ""
- if any(re.search(pattern, text, re.IGNORECASE) for pattern in self._TOC_PATTERNS):
- if toc_start is None:
- toc_start = page_number
- toc_end = page_number
- elif toc_start is not None:
- break
- return toc_start, toc_end
-
- def _parse_toc_lines(self, text: str) -> list[dict[str, int | str]]:
- marker = re.search(
- r"^(List\s+of\s+Figures|List\s+of\s+Tables|图目录|表目录)",
- text,
- re.IGNORECASE | re.MULTILINE,
- )
- if marker:
- text = text[: marker.start()]
-
- pattern = re.compile(r"^\s*(?P
.+?)\s*(?:\.{2,}|\s)\s*(?P\d{1,5})\s*$")
- entries: list[tuple[str, int]] = []
- for raw in text.splitlines():
- line = raw.strip()
- if not line or len(line) < 3 or re.fullmatch(r"\d+", line):
- continue
-
- match = pattern.match(line)
- if not match:
- continue
-
- title = re.sub(r"\s+", " ", match.group("title")).strip("-_::")
- page = self._to_int(match.group("page"), None)
- if not title or page is None or len(title) <= 1:
- continue
- if title.lower() in {"page", "pages", "目录", "contents"}:
- continue
-
- entries.append((title, page))
-
- if not entries:
- return []
-
- dedup: OrderedDict[str, int] = OrderedDict()
- for title, page in entries:
- dedup.setdefault(title, page)
-
- titles = list(dedup.keys())
- pages = [dedup[title] for title in titles]
- result: list[dict[str, int | str]] = []
- for index, title in enumerate(titles):
- start = pages[index]
- end = max(start, pages[index + 1] - 1) if index + 1 < len(pages) else start
- result.append({"title": title, "start": start, "end": end})
- return result
-
- def _attach_page_indexes(
- self, catalog: list[dict[str, int | str]], toc_end: int, num_pages: int
- ) -> list[dict[str, int | str]]:
- if not catalog:
- return []
-
- first_page = None
- for item in catalog:
- start = self._to_int(item.get("start"), None)
- if start is not None and (first_page is None or start < first_page):
- first_page = start
-
- if first_page is None:
- return []
-
- offset = (toc_end + 1) - first_page
- result: list[dict[str, int | str]] = []
- for item in catalog:
- start = self._to_int(item.get("start"), None)
- end = self._to_int(item.get("end"), start)
- if start is None:
- continue
- if end is None:
- end = start
-
- page_start_index = max(0, min(start + offset, num_pages - 1))
- page_end_index = max(page_start_index, min(end + offset, num_pages - 1))
- result.append({
- "title": str(item.get("title") or "Untitled"),
- "start": start,
- "end": max(start, end),
- "page_start_index": page_start_index,
- "page_end_index": page_end_index,
- })
- return result
-
- @staticmethod
- def _to_int(value: Any, default: int | None) -> int | None:
- try:
- if value is None or value == "":
- return default
- return int(value)
- except Exception:
- return default
diff --git a/difyPlugin/pdf/tools/pdf_toc.yaml b/difyPlugin/pdf/tools/pdf_toc.yaml
deleted file mode 100644
index 0916a700..00000000
--- a/difyPlugin/pdf/tools/pdf_toc.yaml
+++ /dev/null
@@ -1,51 +0,0 @@
-identity:
- name: "pdf_toc"
- author: "yslg"
- label:
- en_US: "PDF TOC"
- zh_Hans: "PDF 目录提取"
- pt_BR: "PDF TOC"
- ja_JP: "PDF TOC"
-description:
- human:
- en_US: "Extract the catalog array from a PDF file using metadata or LLM."
- zh_Hans: "从PDF文件中提取目录数组,优先使用元数据,回退使用LLM解析。"
- pt_BR: "Extrair o array de catálogo de um arquivo PDF."
- ja_JP: "PDFファイルからカタログ配列を抽出する。"
- llm: "Extract a catalog array from a PDF file. Returns JSON text like [{title,start,end,page_start_index,page_end_index}]."
-parameters:
- - name: file
- type: file
- required: true
- label:
- en_US: PDF File
- zh_Hans: PDF 文件
- pt_BR: PDF File
- ja_JP: PDF File
- human_description:
- en_US: "PDF file to inspect"
- zh_Hans: "要解析的PDF文件"
- pt_BR: "PDF file to inspect"
- ja_JP: "PDF file to inspect"
- llm_description: "PDF file to extract catalog from"
- form: llm
- fileTypes:
- - "pdf"
- - name: model
- type: model-selector
- scope: llm
- required: true
- label:
- en_US: LLM Model
- zh_Hans: LLM 模型
- pt_BR: Modelo LLM
- ja_JP: LLMモデル
- human_description:
- en_US: "LLM model used for parsing TOC when metadata is unavailable"
- zh_Hans: "当元数据不可用时,用于解析目录的LLM模型"
- pt_BR: "Modelo LLM para análise de TOC"
- ja_JP: "メタデータが利用できない場合のTOC解析用LLMモデル"
- form: form
-extra:
- python:
- source: tools/pdf_toc.py
diff --git a/difyPlugin/数据清洗-大文件处理.yml.bak b/difyPlugin/数据清洗-大文件处理.yml.bak
deleted file mode 100644
index 70ca4830..00000000
--- a/difyPlugin/数据清洗-大文件处理.yml.bak
+++ /dev/null
@@ -1,1000 +0,0 @@
-app:
- description: 优化版:支持大文件PDF处理,跨页表格/段落智能识别合并
- icon: 🤖
- icon_background: '#FFEAD5'
- mode: workflow
- name: 数据清洗-大文件处理
- use_icon_as_answer_icon: false
-dependencies:
-- current_identifier: null
- type: marketplace
- value:
- marketplace_plugin_unique_identifier: samanhappy/word_process:0.0.1@003ecc76645cf2d5160d4e009a29d8eba2946eaaf7134c49971c3b9fedbfab0d
- version: null
-- current_identifier: null
- type: marketplace
- value:
- marketplace_plugin_unique_identifier: langgenius/siliconflow:0.0.44@9dac23fe837d6da24a2cd9ef959c1c93e4e094b7562ad8a2fd3d4cc86c0e3e89
- version: null
-- current_identifier: null
- type: marketplace
- value:
- marketplace_plugin_unique_identifier: bowenliang123/md_exporter:3.6.9@3f027d63e80b44d5d5a9f706871afaef37905b8f8a89a2d152dc530211a8acb1
- version: null
-- current_identifier: null
- type: package
- value:
- plugin_unique_identifier: yslg/pdf:0.0.1@5e83b87d38ad55c2a1e929311d21a86cef5f9e04394b977b3ba16eb34de08b36
- version: null
-kind: app
-version: 0.5.0
-workflow:
- conversation_variables: []
- environment_variables: []
- features:
- file_upload:
- allowed_file_extensions:
- - .JPG
- - .JPEG
- - .PNG
- - .GIF
- - .WEBP
- - .SVG
- - .PDF
- - .pdf
- allowed_file_types:
- - image
- - document
- allowed_file_upload_methods:
- - local_file
- - remote_url
- enabled: false
- fileUploadConfig:
- audio_file_size_limit: 50
- batch_count_limit: 5
- file_size_limit: 500
- image_file_batch_limit: 10
- image_file_size_limit: 10
- single_chunk_attachment_limit: 10
- video_file_size_limit: 100
- workflow_file_upload_limit: 10
- image:
- enabled: false
- number_limits: 3
- transfer_methods:
- - local_file
- - remote_url
- number_limits: 3
- opening_statement: ''
- retriever_resource:
- enabled: true
- sensitive_word_avoidance:
- enabled: false
- speech_to_text:
- enabled: false
- suggested_questions: []
- suggested_questions_after_answer:
- enabled: false
- text_to_speech:
- enabled: false
- language: ''
- voice: ''
- graph:
- edges:
- - data:
- isInIteration: false
- isInLoop: false
- sourceType: start
- targetType: if-else
- id: 1770703294598-source-1770703342256-target
- selected: false
- source: '1770703294598'
- sourceHandle: source
- target: '1770703342256'
- targetHandle: target
- type: custom
- zIndex: 0
- - data:
- isInIteration: false
- isInLoop: false
- sourceType: if-else
- targetType: llm
- id: 1770703342256-true-1770703393190-target
- selected: false
- source: '1770703342256'
- sourceHandle: 'true'
- target: '1770703393190'
- targetHandle: target
- type: custom
- zIndex: 0
- - data:
- isInIteration: false
- isInLoop: false
- sourceType: if-else
- targetType: llm
- id: 1770703342256-93d5294c-5984-4bc0-b30d-cd9e2ffba28d-1770703524412-target
- selected: false
- source: '1770703342256'
- sourceHandle: 93d5294c-5984-4bc0-b30d-cd9e2ffba28d
- target: '1770703524412'
- targetHandle: target
- type: custom
- zIndex: 0
- - data:
- isInIteration: false
- isInLoop: false
- sourceType: llm
- targetType: variable-aggregator
- id: 1770703393190-source-1770703625287-target
- selected: false
- source: '1770703393190'
- sourceHandle: source
- target: '1770703625287'
- targetHandle: target
- type: custom
- zIndex: 0
- - data:
- isInLoop: false
- sourceType: llm
- targetType: variable-aggregator
- id: 1770703524412-source-1770703625287-target
- selected: false
- source: '1770703524412'
- sourceHandle: source
- target: '1770703625287'
- targetHandle: target
- type: custom
- zIndex: 0
- - data:
- isInIteration: false
- isInLoop: false
- sourceType: if-else
- targetType: if-else
- id: 1770703342256-6556b05e-3266-4aa7-b196-ec41f5dd766b-1772348592076-target
- selected: false
- source: '1770703342256'
- sourceHandle: 6556b05e-3266-4aa7-b196-ec41f5dd766b
- target: '1772348592076'
- targetHandle: target
- type: custom
- zIndex: 0
- - data:
- isInLoop: false
- sourceType: if-else
- targetType: document-extractor
- id: 1772348592076-false-1770703633813-target
- selected: false
- source: '1772348592076'
- sourceHandle: 'false'
- target: '1770703633813'
- targetHandle: target
- type: custom
- zIndex: 0
- - data:
- isInIteration: false
- isInLoop: false
- sourceType: if-else
- targetType: tool
- id: 1772348592076-0b4fd2d4-a592-4421-acbb-822db3004219-1772349027446-target
- selected: false
- source: '1772348592076'
- sourceHandle: 0b4fd2d4-a592-4421-acbb-822db3004219
- target: '1772349027446'
- targetHandle: target
- type: custom
- zIndex: 0
- - data:
- isInIteration: false
- isInLoop: false
- sourceType: document-extractor
- targetType: variable-aggregator
- id: 1770703633813-source-1772348969241-target
- selected: false
- source: '1770703633813'
- sourceHandle: source
- target: '1772348969241'
- targetHandle: target
- type: custom
- zIndex: 0
- - data:
- isInLoop: false
- sourceType: tool
- targetType: variable-aggregator
- id: 1772349027446-source-1772348969241-target
- selected: false
- source: '1772349027446'
- sourceHandle: source
- target: '1772348969241'
- targetHandle: target
- type: custom
- zIndex: 0
- - data:
- isInIteration: false
- isInLoop: false
- sourceType: variable-aggregator
- targetType: llm
- id: 1770703625287-source-1770703671732-target
- selected: false
- source: '1770703625287'
- sourceHandle: source
- target: '1770703671732'
- targetHandle: target
- type: custom
- zIndex: 0
- - data:
- isInIteration: false
- isInLoop: false
- sourceType: llm
- targetType: tool
- id: 1770703671732-source-1770704285657-target
- selected: false
- source: '1770703671732'
- sourceHandle: source
- target: '1770704285657'
- targetHandle: target
- type: custom
- zIndex: 0
- - data:
- isInIteration: false
- isInLoop: false
- sourceType: if-else
- targetType: tool
- id: 1772348592076-true-1772527425324-target
- selected: false
- source: '1772348592076'
- sourceHandle: 'true'
- target: '1772527425324'
- targetHandle: target
- type: custom
- zIndex: 0
- - data:
- isInLoop: false
- sourceType: variable-aggregator
- targetType: variable-aggregator
- id: 1772348969241-source-1770703625287-target
- source: '1772348969241'
- sourceHandle: source
- target: '1770703625287'
- targetHandle: target
- type: custom
- zIndex: 0
- - data:
- isInLoop: false
- sourceType: tool
- targetType: end
- id: 1770704285657-source-1770704288628-target
- source: '1770704285657'
- sourceHandle: source
- target: '1770704288628'
- targetHandle: target
- type: custom
- zIndex: 0
- - data:
- isInIteration: false
- isInLoop: false
- sourceType: tool
- targetType: end
- id: 1772527425324-source-1772779766541-target
- source: '1772527425324'
- sourceHandle: source
- target: '1772779766541'
- targetHandle: target
- type: custom
- zIndex: 0
- nodes:
- - data:
- selected: false
- title: 用户输入
- type: start
- variables:
- - allowed_file_extensions: []
- allowed_file_types:
- - image
- - document
- - video
- allowed_file_upload_methods:
- - local_file
- - remote_url
- default: ''
- hint: ''
- label: 文件
- max_length: 48
- options: []
- placeholder: ''
- required: true
- type: file
- variable: file
- height: 109
- id: '1770703294598'
- position:
- x: 0
- y: 55
- positionAbsolute:
- x: 0
- y: 55
- selected: false
- sourcePosition: right
- targetPosition: left
- type: custom
- width: 242
- - data:
- cases:
- - case_id: 'true'
- conditions:
- - comparison_operator: in
- id: f88f279e-5736-4b1b-98cf-f8a9621531a0
- value:
- - image
- varType: file
- variable_selector:
- - '1770703294598'
- - file
- - type
- id: 'true'
- logical_operator: and
- - case_id: 93d5294c-5984-4bc0-b30d-cd9e2ffba28d
- conditions:
- - comparison_operator: in
- id: 48e8d32a-59c5-4573-8e8a-355dc73a39fc
- value:
- - video
- varType: file
- variable_selector:
- - '1770703294598'
- - file
- - type
- id: 93d5294c-5984-4bc0-b30d-cd9e2ffba28d
- logical_operator: and
- - case_id: 6556b05e-3266-4aa7-b196-ec41f5dd766b
- conditions:
- - comparison_operator: in
- id: 9916110c-edf7-4a4a-b324-2f8d85c73299
- value:
- - document
- varType: file
- variable_selector:
- - '1770703294598'
- - file
- - type
- id: 6556b05e-3266-4aa7-b196-ec41f5dd766b
- logical_operator: and
- selected: false
- title: 条件分支
- type: if-else
- height: 220
- id: '1770703342256'
- position:
- x: 342
- y: 0
- positionAbsolute:
- x: 342
- y: 0
- selected: false
- sourcePosition: right
- targetPosition: left
- type: custom
- width: 242
- - data:
- context:
- enabled: false
- variable_selector: []
- model:
- completion_params:
- enable_thinking: true
- temperature: 0.7
- mode: chat
- name: zai-org/GLM-4.6V
- provider: langgenius/siliconflow/siliconflow
- prompt_template:
- - id: 4b1706f6-3216-4fb7-a6dc-978ce43ff491
- role: system
- text: 识别图片中所有内容和文字,并进行合理的描述编排
- reasoning_format: separated
- selected: false
- title: 图片理解
- type: llm
- vision:
- configs:
- detail: high
- variable_selector:
- - '1770703294598'
- - file
- enabled: true
- height: 88
- id: '1770703393190'
- position:
- x: 2772
- y: 82
- positionAbsolute:
- x: 2772
- y: 82
- selected: false
- sourcePosition: right
- targetPosition: left
- type: custom
- width: 242
- - data:
- context:
- enabled: false
- variable_selector: []
- model:
- completion_params: {}
- mode: chat
- name: Pro/moonshotai/Kimi-K2.5
- provider: langgenius/siliconflow/siliconflow
- prompt_template:
- - id: 497bebc3-5e75-4c2b-940c-ba485dc1e51a
- role: system
- text: 识别视频中所有内容和文字,并进行合理的描述编排
- reasoning_format: separated
- selected: false
- title: 视频理解
- type: llm
- vision:
- configs:
- detail: high
- variable_selector:
- - '1770703294598'
- - file
- enabled: true
- height: 88
- id: '1770703524412'
- position:
- x: 1770
- y: 177
- positionAbsolute:
- x: 1770
- y: 177
- selected: false
- sourcePosition: right
- targetPosition: left
- type: custom
- width: 242
- - data:
- cases:
- - case_id: 'true'
- conditions:
- - comparison_operator: contains
- id: 7a6d2b1e-9704-41f3-aeba-40c6e2484d56
- value: pdf
- varType: string
- variable_selector:
- - '1770703294598'
- - file
- - extension
- id: 'true'
- logical_operator: and
- - case_id: 0b4fd2d4-a592-4421-acbb-822db3004219
- conditions:
- - comparison_operator: contains
- id: 67767b34-ad03-48f4-80ef-100eb78e13ab
- value: doc
- varType: file
- variable_selector:
- - '1770703294598'
- - file
- - extension
- logical_operator: and
- selected: false
- title: 条件分支 2
- type: if-else
- height: 172
- id: '1772348592076'
- position:
- x: 704
- y: 424
- positionAbsolute:
- x: 704
- y: 424
- selected: false
- sourcePosition: right
- targetPosition: left
- type: custom
- width: 242
- - data:
- is_array_file: false
- selected: false
- title: 文档提取器
- type: document-extractor
- variable_selector:
- - '1770703294598'
- - file
- height: 104
- id: '1770703633813'
- position:
- x: 1066
- y: 337
- positionAbsolute:
- x: 1066
- y: 337
- selected: false
- sourcePosition: right
- targetPosition: left
- type: custom
- width: 242
- - data:
- is_team_authorization: true
- paramSchemas:
- - auto_generate: null
- default: null
- form: llm
- human_description:
- en_US: Word file to extract text and images from
- ja_JP: Word file to extract text and images from
- pt_BR: Word file to extract text and images from
- zh_Hans: 要提取文本和图片的Word文件
- label:
- en_US: Word Content
- ja_JP: Word Content
- pt_BR: Word Content
- zh_Hans: Word 内容
- llm_description: Word file content to be extracted
- max: null
- min: null
- name: word_content
- options: []
- placeholder: null
- precision: null
- required: true
- scope: null
- template: null
- type: file
- params:
- word_content: ''
- plugin_id: samanhappy/word_process
- plugin_unique_identifier: samanhappy/word_process:0.0.1@003ecc76645cf2d5160d4e009a29d8eba2946eaaf7134c49971c3b9fedbfab0d
- provider_icon: https://dify.org.xyzh.yslg/console/api/workspaces/current/plugin/icon?tenant_id=fe3bcf55-9a04-4850-8473-7f97e1c09b97&filename=cb0643689e2f8152d38c44a267a459fae99ff208b0bc164e27ccb053fc1844cd.svg
- provider_id: samanhappy/word_process/word_process
- provider_name: samanhappy/word_process/word_process
- provider_type: builtin
- selected: false
- title: Word提取器
- tool_configurations: {}
- tool_description: 一个将Word文件提取为文本和图片的工具
- tool_label: Word提取器
- tool_name: word_extractor
- tool_node_version: '2'
- tool_parameters:
- word_content:
- type: variable
- value:
- - '1770703294598'
- - file
- type: tool
- height: 52
- id: '1772349027446'
- position:
- x: 1066
- y: 521
- positionAbsolute:
- x: 1066
- y: 521
- selected: false
- sourcePosition: right
- targetPosition: left
- type: custom
- width: 242
- - data:
- output_type: string
- selected: false
- title: 文档提取聚合
- type: variable-aggregator
- variables:
- - - '1772349027446'
- - text
- - - '1770703633813'
- - text
- height: 134
- id: '1772348969241'
- position:
- x: 1428
- y: 344
- positionAbsolute:
- x: 1428
- y: 344
- selected: false
- sourcePosition: right
- targetPosition: left
- type: custom
- width: 242
- - data:
- advanced_settings:
- group_enabled: false
- groups:
- - groupId: 058efed3-3c6a-44d6-8f40-704abda8c413
- group_name: Group1
- output_type: string
- variables:
- - - '1770703393190'
- - text
- - - '1770703524412'
- - text
- - - '1772349100004'
- - result
- output_type: string
- selected: false
- title: 文件提取聚合
- type: variable-aggregator
- variables:
- - - '1770703393190'
- - text
- - - '1770703524412'
- - text
- - - '1772348969241'
- - output
- height: 160
- id: '1770703625287'
- position:
- x: 3134
- y: 291
- positionAbsolute:
- x: 3134
- y: 291
- selected: false
- sourcePosition: right
- targetPosition: left
- type: custom
- width: 242
- - data:
- context:
- enabled: false
- variable_selector: []
- model:
- completion_params:
- temperature: 0.3
- mode: chat
- name: Qwen/Qwen3-32B
- provider: langgenius/siliconflow/siliconflow
- prompt_template:
- - id: 48ec1856-fdd7-4f4a-9ce5-1aa635822550
- role: system
- text: '你是一个专业的文档整理和合并专家。以下内容是从文档中分块提取并格式化的Markdown文本。由于分块处理,各块之间可能存在跨页断裂和重复内容,需要你进行智能合并。
-
-
- ## 你的任务
-
-
- ### 1. 合并跨页表格
-
- - 找到所有 `` 和对应的 `` 标记
-
- - 将前一块末尾的不完整表格和后一块开头的延续表格合并为一个完整表格
-
- - 确保表头只保留一份,数据行完整拼接,表格结构正确
-
-
- ### 2. 合并跨页段落
-
- - 找到所有 `` 和 ``
- 标记
-
- - 将被截断的段落拼接为语义完整的段落
-
-
- ### 3. 合并跨页列表
-
- - 找到所有 `` 和 ``
- 标记
-
- - 将被截断的列表合并为完整列表,确保编号连续
-
-
- ### 4. 去除重复内容
-
- - 由于分块时存在页面重叠,相邻块之间可能有重复的段落、表格行或列表项
-
- - 识别并去除这些重复内容,每段内容只保留一份
-
-
- ### 5. 清理所有辅助标记
-
- - 移除所有 `` 形式的辅助标记和块分隔符
-
- - 确保最终输出中不包含任何HTML注释或处理标记
-
-
- ### 6. 格式规范化
-
- - 确保标题层级正确且连续
-
- - 确保表格格式完整(有表头行和分隔行)
-
- - 确保列表编号连续
-
- - 统一全文格式风格
-
-
- 直接输出最终的Markdown内容,不要用```markdown```包裹。
-
-
- 以下是需要整理合并的内容:
-
- {{#1770703625287.output#}}'
- reasoning_format: separated
- selected: false
- title: 数据清洗与跨页合并
- type: llm
- vision:
- enabled: false
- height: 88
- id: '1770703671732'
- position:
- x: 3660
- y: 327
- positionAbsolute:
- x: 3660
- y: 327
- selected: false
- sourcePosition: right
- targetPosition: left
- type: custom
- width: 242
- - data:
- is_team_authorization: true
- paramSchemas:
- - auto_generate: null
- default: null
- form: llm
- human_description:
- en_US: Markdown text
- ja_JP: Markdown text
- pt_BR: Markdown text
- zh_Hans: Markdown格式文本
- label:
- en_US: Markdown text
- ja_JP: Markdown text
- pt_BR: Markdown text
- zh_Hans: Markdown格式文本
- llm_description: ''
- max: null
- min: null
- name: md_text
- options: []
- placeholder: null
- precision: null
- required: true
- scope: null
- template: null
- type: string
- - auto_generate: null
- default: null
- form: llm
- human_description:
- en_US: Optional custom output file name, and the filename suffix is not
- required.
- ja_JP: Optional custom output file name, and the filename suffix is not
- required.
- pt_BR: Optional custom output file name, and the filename suffix is not
- required.
- zh_Hans: 可选的自定义输出文件名,后缀名无需指定
- label:
- en_US: Output Filename
- ja_JP: Output Filename
- pt_BR: Output Filename
- zh_Hans: 输出文件名
- llm_description: ''
- max: null
- min: null
- name: output_filename
- options: []
- placeholder: null
- precision: null
- required: false
- scope: null
- template: null
- type: string
- params:
- md_text: ''
- output_filename: ''
- plugin_id: bowenliang123/md_exporter
- plugin_unique_identifier: bowenliang123/md_exporter:3.4.0@a5ce3ac3114f3dd6ab4fe49f0bb931a31af49ff555e479ec45e8aaa5d44157ee
- provider_icon: https://dify.org.xyzh.yslg/console/api/workspaces/current/plugin/icon?tenant_id=fe3bcf55-9a04-4850-8473-7f97e1c09b97&filename=f0bad95cda1671b4e49f0e05df6122ef9ec5d554e138f128795d11d3806c00ef.svg
- provider_id: bowenliang123/md_exporter/md_exporter
- provider_name: bowenliang123/md_exporter/md_exporter
- provider_type: builtin
- selected: false
- title: Markdown ⮕ MD
- tool_configurations: {}
- tool_description: 将 Markdown 转换为 .md 文件的工具
- tool_label: Markdown ⮕ MD
- tool_name: md_to_md
- tool_node_version: '2'
- tool_parameters:
- md_text:
- type: mixed
- value: '{{#1770703671732.text#}}'
- output_filename:
- type: mixed
- value: ''
- type: tool
- height: 52
- id: '1770704285657'
- position:
- x: 4231.079190350343
- y: 573.1529224498603
- positionAbsolute:
- x: 4231.079190350343
- y: 573.1529224498603
- selected: false
- sourcePosition: right
- targetPosition: left
- type: custom
- width: 242
- - data:
- outputs:
- - value_selector:
- - '1770704285657'
- - files
- value_type: array[file]
- variable: _
- selected: false
- title: 输出
- type: end
- height: 88
- id: '1770704288628'
- position:
- x: 5142.505374898874
- y: 614.2288378497078
- positionAbsolute:
- x: 5142.505374898874
- y: 614.2288378497078
- selected: false
- sourcePosition: right
- targetPosition: left
- type: custom
- width: 242
- - data:
- is_team_authorization: true
- paramSchemas:
- - auto_generate: null
- default: null
- form: llm
- human_description:
- en_US: PDF file to convert
- ja_JP: 変換するPDFファイル
- pt_BR: Arquivo PDF para converter
- zh_Hans: 要转换的 PDF 文件
- label:
- en_US: PDF File
- ja_JP: PDFファイル
- pt_BR: Arquivo PDF
- zh_Hans: PDF 文件
- llm_description: PDF file to convert to Markdown
- max: null
- min: null
- name: file
- options: []
- placeholder: null
- precision: null
- required: true
- scope: null
- template: null
- type: file
- - auto_generate: null
- default: true
- form: form
- human_description:
- en_US: Whether to embed images as base64 (default true)
- ja_JP: 画像をbase64として埋め込むか
- pt_BR: Se deve incorporar imagens como base64
- zh_Hans: 是否将图片以base64嵌入(默认是)
- label:
- en_US: Include Images
- ja_JP: 画像を含める
- pt_BR: Incluir Imagens
- zh_Hans: 包含图片
- llm_description: Set to true to embed images as base64
- max: null
- min: null
- name: include_images
- options: []
- placeholder: null
- precision: null
- required: false
- scope: null
- template: null
- type: boolean
- - auto_generate: null
- default: 150
- form: form
- human_description:
- en_US: DPI for rendering vector drawings (72-300)
- ja_JP: ベクター描画のDPI
- pt_BR: DPI para renderizar desenhos vetoriais
- zh_Hans: 矢量图渲染DPI(72-300,默认150)
- label:
- en_US: Image DPI
- ja_JP: 画像DPI
- pt_BR: DPI da Imagem
- zh_Hans: 图片DPI
- llm_description: Resolution for rendering vector drawings
- max: null
- min: null
- name: image_dpi
- options: []
- placeholder: null
- precision: null
- required: false
- scope: null
- template: null
- type: number
- params:
- file: ''
- image_dpi: ''
- include_images: ''
- plugin_id: yslg/pdf
- plugin_unique_identifier: yslg/pdf:0.0.1@cc5f6665002ca7c06855ef6703ee9f6e051ddbfb3d00d2aa899f9f280f45dd61
- provider_icon: https://dify.org.xyzh.yslg/console/api/workspaces/current/plugin/icon?tenant_id=fe3bcf55-9a04-4850-8473-7f97e1c09b97&filename=f1441c071a96f87326f5eb2ae2bfc5a570e9260e7d2b74c2ac15df4037231c64.svg
- provider_id: yslg/pdf/pdf
- provider_name: yslg/pdf/pdf
- provider_type: builtin
- selected: true
- title: PDF转Markdown
- tool_configurations:
- image_dpi:
- type: constant
- value: 150
- include_images:
- type: constant
- value: true
- model:
- type: constant
- value:
- completion_params: {}
- mode: chat
- model: Qwen/Qwen3-32B
- model_type: llm
- provider: langgenius/siliconflow/siliconflow
- tool_description: 将PDF转换为Markdown,图片base64嵌入,无需大模型
- tool_label: PDF转Markdown
- tool_name: pdf_to_markdown
- tool_node_version: '2'
- tool_parameters:
- file:
- type: variable
- value:
- - '1770703294598'
- - file
- type: tool
- height: 140
- id: '1772527425324'
- position:
- x: 1881.4558888576478
- y: 697.8632689662784
- positionAbsolute:
- x: 1881.4558888576478
- y: 697.8632689662784
- selected: true
- sourcePosition: right
- targetPosition: left
- type: custom
- width: 242
- - data:
- outputs:
- - value_selector:
- - '1772527425324'
- - files
- value_type: array[file]
- variable: files
- selected: false
- title: 输出 2
- type: end
- height: 88
- id: '1772779766541'
- position:
- x: 2183.4558888576476
- y: 697.8632689662784
- positionAbsolute:
- x: 2183.4558888576476
- y: 697.8632689662784
- selected: false
- sourcePosition: right
- targetPosition: left
- type: custom
- width: 242
- viewport:
- x: -675.5777822239224
- y: 9.568461206490326
- zoom: 0.7578582832552
- rag_pipeline_variables: []
diff --git a/difyPlugin/需求文档.md b/difyPlugin/需求文档.md
deleted file mode 100644
index 1a83901e..00000000
--- a/difyPlugin/需求文档.md
+++ /dev/null
@@ -1,122 +0,0 @@
-# Dify 插件服务需求文档
-
-## 1. 项目概述
-
-开发一个基于 FastAPI 框架的 Dify 插件服务,实现与 Dify 平台的集成,支持多种插件的部署和管理,提供各种功能扩展。
-
-## 2. 技术栈
-
-- **框架**:FastAPI
-- **语言**:Python 3.9+
-- **依赖管理**:Poetry 或 Pip
-- **部署方式**:Docker 容器化
-
-## 3. 项目架构
-
-### 3.1 架构设计
-- **插件管理系统**:统一管理多个 Dify 插件
-- **插件加载机制**:支持动态加载和热更新插件
-- **插件隔离**:每个插件运行在独立的环境中
-- **API 网关**:统一的 API 入口,路由到对应插件
-
-### 3.2 目录结构
-```
-difyPlugin/
-├── main.py # 应用入口
-├── requirements.txt # 依赖管理
-├── .env # 环境配置
-├── app/
-│ ├── api/ # API 路由
-│ ├── core/ # 核心配置
-│ ├── plugins/ # 插件目录
-│ │ ├── plugin1/ # 插件1
-│ │ ├── plugin2/ # 插件2
-│ │ └── __init__.py # 插件加载器
-│ └── services/ # 公共服务
-└── tests/ # 测试目录
-```
-
-### 3.3 插件规范
-- **插件结构**:每个插件包含独立的配置、逻辑和 API
-- **插件接口**:统一的插件接口规范
-- **插件注册**:自动发现和注册插件
-- **插件生命周期**:支持插件的启动、停止和重启
-
-## 4. 核心功能
-
-### 4.1 基础功能
-- **健康检查**:提供服务状态检查接口
-- **版本管理**:支持插件版本控制
-- **认证机制**:实现与 Dify 的安全认证
-- **插件管理**:支持插件的注册、启动、停止和卸载
-
-### 4.2 业务功能
-- **数据处理**:支持各种数据格式的转换和处理
-- **外部 API 集成**:对接第三方服务的 API
-- **自定义逻辑**:支持用户自定义业务逻辑
-- **事件处理**:响应 Dify 平台的事件触发
-
-## 5. 接口设计
-
-### 5.1 主要接口
-- `GET /health`:健康检查
-- `GET /api/v1/plugins`:获取插件列表
-- `GET /api/v1/plugins/{plugin_id}`:获取插件详情
-- `POST /api/v1/plugins/{plugin_id}/execute`:执行插件功能
-- `GET /api/v1/plugins/{plugin_id}/metadata`:获取插件元数据
-- `POST /api/v1/plugins/{plugin_id}/start`:启动插件
-- `POST /api/v1/plugins/{plugin_id}/stop`:停止插件
-
-### 5.2 请求/响应格式
-- **请求格式**:JSON
-- **响应格式**:JSON,包含状态码和数据
-
-## 6. 部署要求
-
-- **环境变量**:支持通过环境变量配置服务参数
-- **日志管理**:集成结构化日志
-- **监控指标**:提供 Prometheus 指标接口
-- **错误处理**:完善的错误处理和异常捕获
-- **插件隔离**:支持插件的独立部署和隔离
-
-## 7. 集成方式
-
-- **Dify 插件注册**:按照 Dify 插件规范注册
-- **Webhook 配置**:支持 Dify 平台的 Webhook 回调
-- **事件订阅**:订阅 Dify 平台的事件
-- **插件发现**:自动发现和注册新插件
-
-## 8. 开发计划
-
-### 8.1 阶段一:项目初始化
-- 创建 FastAPI 项目结构
-- 配置依赖管理
-- 实现插件管理系统
-
-### 8.2 阶段二:核心功能开发
-- 实现插件加载机制
-- 开发插件接口规范
-- 实现数据处理功能
-- 集成外部 API
-
-### 8.3 阶段三:测试与部署
-- 编写单元测试
-- 集成测试
-- 容器化部署
-- 插件示例开发
-
-## 9. 技术要求
-
-- **代码质量**:遵循 PEP 8 编码规范
-- **文档**:完善的 API 文档
-- **性能**:优化响应速度和资源占用
-- **安全**:实现安全的认证和授权机制
-- **可扩展性**:支持插件的动态添加和移除
-
-## 10. 交付物
-
-- **源代码**:完整的项目代码
-- **部署文档**:详细的部署步骤
-- **API 文档**:自动生成的 API 文档
-- **测试报告**:测试结果和覆盖率报告
-- **插件开发指南**:插件开发和注册指南
\ No newline at end of file
diff --git a/docs/AI训练资料/2.pdf b/docs/AI训练资料/2.pdf
deleted file mode 100644
index eccbb822..00000000
Binary files a/docs/AI训练资料/2.pdf and /dev/null differ