更新

2026-03-29 11:07:30 +08:00
parent 136ddc270c
commit 140dd3ca35
26 changed files with 4 additions and 2837 deletions
--- a/.gitignore
+++ b/.gitignore
@@ -7,3 +7,4 @@
 **/*.difypkg
 urbanLifeServ/*
 */.data
 docs
--- a/2
+++ b/2
--- a/2
+++ b/2
--- a/difyPlugin/DifyCLI.md
+++ b/difyPlugin/DifyCLI.md
@@ -1,146 +0,0 @@
 > ## Documentation Index
 > Fetch the complete documentation index at: https://docs.dify.ai/llms.txt
 > Use this file to discover all available pages before exploring further.
 # CLI
 > Dify 插件开发命令行界面
 <Note> ⚠️ 本文档由 AI 自动翻译。如有任何不准确之处，请参考[英文原版](/en/develop-plugin/getting-started/cli)。</Note>
 使用命令行界面（CLI）设置和打包你的 Dify 插件。CLI 提供了一种简化的方式来管理你的插件开发工作流，从初始化到打包。
 本指南将指导你如何使用 CLI 进行 Dify 插件开发。
 ## 前提条件
 在开始之前，请确保已安装以下内容：
 * Python 版本 ≥ 3.12
 * Dify CLI
 * Homebrew（适用于 Mac 用户）
 ## 创建 Dify 插件项目
 <Tabs>
  <Tab title="Mac">
    ```bash  theme={null}
    brew tap langgenius/dify
    brew install dify
    ```
  </Tab>
  <Tab title="Linux">
    从 [Dify GitHub 发布页面](https://github.com/langgenius/dify-plugin-daemon/releases) 获取最新的 Dify CLI
    ```bash  theme={null}
    # Download dify-plugin-darwin-arm64
    chmod +x dify-plugin-darwin-arm64
    mv dify-plugin-darwin-arm64 dify
    sudo mv dify /usr/local/bin/
    ```
  </Tab>
 </Tabs>
 现在你已成功安装 Dify CLI。你可以通过运行以下命令来验证安装：
 ```bash  theme={null}
 dify version
 ```
 你可以使用以下命令创建一个新的 Dify 插件项目：
 ```bash  theme={null}
 dify plugin init
 ```
 根据提示填写必填字段：
 ```bash  theme={null}
 Edit profile of the plugin
 Plugin name (press Enter to next step): hello-world
 Author (press Enter to next step): langgenius
 Description (press Enter to next step): hello world example
 Repository URL (Optional) (press Enter to next step): Repository URL (Optional)
  Enable multilingual README: [✔] English is required by default
 Languages to generate:
    English: [✔] (required)
  → 简体中文 (Simplified Chinese): [✔]
    日本語 (Japanese): [✘]
    Português (Portuguese - Brazil): [✘]
 Controls:
  ↑/↓ Navigate • Space/Tab Toggle selection • Enter Next step
 ```
 选择 `python` 并按 Enter 继续使用 Python 插件模板。
 ```bash  theme={null}
 Select the type of plugin you want to create, and press `Enter` to continue
 Before starting, here's some basic knowledge about Plugin types in Dify:
 - Tool: Tool Providers like Google Search, Stable Diffusion, etc. Used to perform specific tasks.
 - Model: Model Providers like OpenAI, Anthropic, etc. Use their models to enhance AI capabilities.
 - Endpoint: Similar to Service API in Dify and Ingress in Kubernetes. Extend HTTP services as endpoints with custom logi
 - Agent Strategy: Implement your own agent strategies like Function Calling, ReAct, ToT, CoT, etc.
 Based on the ability you want to extend, Plugins are divided into four types: Tool, Model, Extension, and Agent Strategy
 - Tool: A tool provider that can also implement endpoints. For example, building a Discord Bot requires both Sending and
 - Model: Strictly for model providers, no other extensions allowed.
 - Extension: For simple HTTP services that extend functionality.
 - Agent Strategy: Implement custom agent logic with a focused approach.
 We've provided templates to help you get started. Choose one of the options below:
 -> tool
  agent-strategy
  llm
  text-embedding
  rerank
  tts
  speech2text
  moderation
  extension
 ```
 输入默认的 dify 版本，留空则使用最新版本：
 ```bash  theme={null}
 Edit minimal Dify version requirement, leave it blank by default
 Minimal Dify version (press Enter to next step): 
 ```
 现在你已准备就绪！CLI 将创建一个以你提供的插件名称命名的新目录，并为你的插件设置基本结构。
 ```bash  theme={null}
 cd hello-world
 ```
 ## 运行插件
 确保你在 hello-world 目录中
 ```bash  theme={null}
 cp .env.example .env
 ```
 编辑 `.env` 文件以设置插件的环境变量，例如 API 密钥或其他配置。你可以在 Dify 仪表板中找到这些变量。登录到你的 Dify 环境，点击右上角的"插件"图标，然后点击调试图标（或类似虫子的图标）。在弹出窗口中，复制"API Key"和"Host Address"。（请参考你本地对应的截图，其中显示了获取密钥和主机地址的界面）
 ```bash  theme={null}
 INSTALL_METHOD=remote
 REMOTE_INSTALL_HOST=debug-plugin.dify.dev
 REMOTE_INSTALL_PORT=5003
 REMOTE_INSTALL_KEY=********-****-****-****-************
 ```
 现在你可以使用以下命令在本地运行你的插件：
 ```bash  theme={null}
 pip install -r requirements.txt
 python -m main
 ```
 ***
 [编辑此页面](https://github.com/langgenius/dify-docs/edit/main/en/develop-plugin/getting-started/cli.mdx) | [报告问题](https://github.com/langgenius/dify-docs/issues/new?template=docs.yml)
--- a/difyPlugin/pdf/.difyignore
+++ b/difyPlugin/pdf/.difyignore
@@ -1,184 +0,0 @@
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]
 *$py.class
 # Distribution / packaging
 .Python
 build/
 develop-eggs/
 dist/
 downloads/
 eggs/
 .eggs/
 lib/
 lib64/
 parts/
 sdist/
 var/
 wheels/
 share/python-wheels/
 *.egg-info/
 .installed.cfg
 *.egg
 MANIFEST
 # PyInstaller
 #  Usually these files are written by a python script from a template
 #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 *.manifest
 *.spec
 # Installer logs
 pip-log.txt
 pip-delete-this-directory.txt
 # Unit test / coverage reports
 htmlcov/
 .tox/
 .nox/
 .coverage
 .coverage.*
 .cache
 nosetests.xml
 coverage.xml
 *.cover
 *.py,cover
 .hypothesis/
 .pytest_cache/
 cover/
 # Translations
 *.mo
 *.pot
 # Django stuff:
 *.log
 local_settings.py
 db.sqlite3
 db.sqlite3-journal
 # Flask stuff:
 instance/
 .webassets-cache
 # Scrapy stuff:
 .scrapy
 # Sphinx documentation
 docs/_build/
 # PyBuilder
 .pybuilder/
 target/
 # Jupyter Notebook
 .ipynb_checkpoints
 # IPython
 profile_default/
 ipython_config.py
 # pyenv
 #   For a library or package, you might want to ignore these files since the code is
 #   intended to run in multiple environments; otherwise, check them in:
 .python-version
 # pipenv
 #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 #   install all needed dependencies.
 Pipfile.lock
 # UV
 #   Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
 #   This is especially recommended for binary packages to ensure reproducibility, and is more
 #   commonly ignored for libraries.
 uv.lock
 # poetry
 #   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
 #   This is especially recommended for binary packages to ensure reproducibility, and is more
 #   commonly ignored for libraries.
 #   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
 poetry.lock
 # pdm
 #   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
 #pdm.lock
 #   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
 #   in version control.
 #   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
 .pdm.toml
 .pdm-python
 .pdm-build/
 # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
 __pypackages__/
 # Celery stuff
 celerybeat-schedule
 celerybeat.pid
 # SageMath parsed files
 *.sage.py
 # Environments
 .env
 .venv
 env/
 venv/
 ENV/
 env.bak/
 venv.bak/
 # Spyder project settings
 .spyderproject
 .spyproject
 # Rope project settings
 .ropeproject
 # mkdocs documentation
 /site
 # mypy
 .mypy_cache/
 .dmypy.json
 dmypy.json
 # Pyre type checker
 .pyre/
 # pytype static type analyzer
 .pytype/
 # Cython debug symbols
 cython_debug/
 # PyCharm
 #  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
 #  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
 #  and can be added to the global gitignore or merged into this file.  For a more nuclear
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
 .idea/
 # Vscode
 .vscode/
 # Git
 .git/
 .gitignore
 .github/
 # Mac
 .DS_Store
 # Windows
 Thumbs.db
 # Dify plugin packages
 #  To prevent packaging repetitively
 *.difypkg
--- a/difyPlugin/pdf/.env.example
+++ b/difyPlugin/pdf/.env.example
@@ -1,3 +0,0 @@
 INSTALL_METHOD=remote
 REMOTE_INSTALL_URL=debug.dify.ai:5003
 REMOTE_INSTALL_KEY=********-****-****-****-************
--- a/difyPlugin/pdf/.github/workflows/plugin-publish.yml
+++ b/difyPlugin/pdf/.github/workflows/plugin-publish.yml
@@ -1,109 +0,0 @@
 name: Plugin Publish Workflow
 on:
  release:
    types: [published]
 jobs:
  publish:
    runs-on: ubuntu-latest
    steps:
      - name: Checkout code
        uses: actions/checkout@v3
      - name: Download CLI tool
        run: |
          mkdir -p $RUNNER_TEMP/bin
          cd $RUNNER_TEMP/bin
          wget https://github.com/langgenius/dify-plugin-daemon/releases/download/0.0.6/dify-plugin-linux-amd64
          chmod +x dify-plugin-linux-amd64
          echo "CLI tool location:"
          pwd
          ls -la dify-plugin-linux-amd64
      - name: Get basic info from manifest
        id: get_basic_info
        run: |
          PLUGIN_NAME=$(grep "^name:" manifest.yaml | cut -d' ' -f2)
          echo "Plugin name: $PLUGIN_NAME"
          echo "plugin_name=$PLUGIN_NAME" >> $GITHUB_OUTPUT
          VERSION=$(grep "^version:" manifest.yaml | cut -d' ' -f2)
          echo "Plugin version: $VERSION"
          echo "version=$VERSION" >> $GITHUB_OUTPUT
          # If the author's name is not your github username, you can change the author here
          AUTHOR=$(grep "^author:" manifest.yaml | cut -d' ' -f2)
          echo "Plugin author: $AUTHOR"
          echo "author=$AUTHOR" >> $GITHUB_OUTPUT
      - name: Package Plugin
        id: package
        run: |
          cd $GITHUB_WORKSPACE
          PACKAGE_NAME="${{ steps.get_basic_info.outputs.plugin_name }}-${{ steps.get_basic_info.outputs.version }}.difypkg"
          $RUNNER_TEMP/bin/dify-plugin-linux-amd64 plugin package . -o "$PACKAGE_NAME"
          echo "Package result:"
          ls -la "$PACKAGE_NAME"
          echo "package_name=$PACKAGE_NAME" >> $GITHUB_OUTPUT
          echo "\nFull file path:"
          pwd
          echo "\nDirectory structure:"
          tree || ls -R
      - name: Checkout target repo
        uses: actions/checkout@v3
        with:
          repository: ${{steps.get_basic_info.outputs.author}}/dify-plugins
          path: dify-plugins
          token: ${{ secrets.PLUGIN_ACTION }}
          fetch-depth: 1
          persist-credentials: true
      - name: Prepare and create PR
        run: |
          PACKAGE_NAME="${{ steps.get_basic_info.outputs.plugin_name }}-${{ steps.get_basic_info.outputs.version }}.difypkg"
          mkdir -p dify-plugins/${{ steps.get_basic_info.outputs.author }}/${{ steps.get_basic_info.outputs.plugin_name }}
          mv "$PACKAGE_NAME" dify-plugins/${{ steps.get_basic_info.outputs.author }}/${{ steps.get_basic_info.outputs.plugin_name }}/
          cd dify-plugins
          git config user.name "GitHub Actions"
          git config user.email "actions@github.com"
          git fetch origin main
          git checkout main
          git pull origin main
          BRANCH_NAME="bump-${{ steps.get_basic_info.outputs.plugin_name }}-plugin-${{ steps.get_basic_info.outputs.version }}"
          git checkout -b "$BRANCH_NAME"
          git add .
          git commit -m "bump ${{ steps.get_basic_info.outputs.plugin_name }} plugin to version ${{ steps.get_basic_info.outputs.version }}"
          git push -u origin "$BRANCH_NAME" --force
          git branch -a
          echo "Waiting for branch to sync..."
          sleep 10  # Wait 10 seconds for branch sync
      - name: Create PR via GitHub API
        env:
          # How to config the token:
          # 1. Profile -> Settings -> Developer settings -> Personal access tokens -> Generate new token (with repo scope) -> Copy the token
          # 2. Go to the target repository -> Settings -> Secrets and variables -> Actions -> New repository secret -> Add the token as PLUGIN_ACTION
          GH_TOKEN: ${{ secrets.PLUGIN_ACTION }}
        run: |
          gh pr create \
            --repo langgenius/dify-plugins \
            --head "${{ steps.get_basic_info.outputs.author }}:${{ steps.get_basic_info.outputs.plugin_name }}-${{ steps.get_basic_info.outputs.version }}" \
            --base main \
            --title "bump ${{ steps.get_basic_info.outputs.plugin_name }} plugin to version ${{ steps.get_basic_info.outputs.version }}" \
            --body "bump ${{ steps.get_basic_info.outputs.plugin_name }} plugin package to version ${{ steps.get_basic_info.outputs.version }}
            Changes:
            - Updated plugin package file" || echo "PR already exists or creation skipped." # Handle cases where PR already exists
--- a/difyPlugin/pdf/.gitignore
+++ b/difyPlugin/pdf/.gitignore
@@ -1,176 +0,0 @@
 # Byte-compiled / optimized / DLL files
 __pycache__/
 *.py[cod]
 *$py.class
 # C extensions
 *.so
 # Distribution / packaging
 .Python
 build/
 develop-eggs/
 dist/
 downloads/
 eggs/
 .eggs/
 lib/
 lib64/
 parts/
 sdist/
 var/
 wheels/
 share/python-wheels/
 *.egg-info/
 .installed.cfg
 *.egg
 MANIFEST
 # PyInstaller
 #  Usually these files are written by a python script from a template
 #  before PyInstaller builds the exe, so as to inject date/other infos into it.
 *.manifest
 *.spec
 # Installer logs
 pip-log.txt
 pip-delete-this-directory.txt
 # Unit test / coverage reports
 htmlcov/
 .tox/
 .nox/
 .coverage
 .coverage.*
 .cache
 nosetests.xml
 coverage.xml
 *.cover
 *.py,cover
 .hypothesis/
 .pytest_cache/
 cover/
 # Translations
 *.mo
 *.pot
 # Django stuff:
 *.log
 local_settings.py
 db.sqlite3
 db.sqlite3-journal
 # Flask stuff:
 instance/
 .webassets-cache
 # Scrapy stuff:
 .scrapy
 # Sphinx documentation
 docs/_build/
 # PyBuilder
 .pybuilder/
 target/
 # Jupyter Notebook
 .ipynb_checkpoints
 # IPython
 profile_default/
 ipython_config.py
 # pyenv
 #   For a library or package, you might want to ignore these files since the code is
 #   intended to run in multiple environments; otherwise, check them in:
 # .python-version
 # pipenv
 #   According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
 #   However, in case of collaboration, if having platform-specific dependencies or dependencies
 #   having no cross-platform support, pipenv may install dependencies that don't work, or not
 #   install all needed dependencies.
 #Pipfile.lock
 # UV
 #   Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
 #   This is especially recommended for binary packages to ensure reproducibility, and is more
 #   commonly ignored for libraries.
 #uv.lock
 # poetry
 #   Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
 #   This is especially recommended for binary packages to ensure reproducibility, and is more
 #   commonly ignored for libraries.
 #   https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
 #poetry.lock
 # pdm
 #   Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
 #pdm.lock
 #   pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
 #   in version control.
 #   https://pdm.fming.dev/latest/usage/project/#working-with-version-control
 .pdm.toml
 .pdm-python
 .pdm-build/
 # PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
 __pypackages__/
 # Celery stuff
 celerybeat-schedule
 celerybeat.pid
 # SageMath parsed files
 *.sage.py
 # Environments
 .env
 .venv
 env/
 venv/
 ENV/
 env.bak/
 venv.bak/
 # Spyder project settings
 .spyderproject
 .spyproject
 # Rope project settings
 .ropeproject
 # mkdocs documentation
 /site
 # mypy
 .mypy_cache/
 .dmypy.json
 dmypy.json
 # Pyre type checker
 .pyre/
 # pytype static type analyzer
 .pytype/
 # Cython debug symbols
 cython_debug/
 # PyCharm
 #  JetBrains specific template is maintained in a separate JetBrains.gitignore that can
 #  be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
 #  and can be added to the global gitignore or merged into this file.  For a more nuclear
 #  option (not recommended) you can uncomment the following to ignore the entire idea folder.
 .idea/
 # Vscode
 .vscode/
 # macOS
 .DS_Store
 .AppleDouble
 .LSOverride
--- a/difyPlugin/pdf/GUIDE.md
+++ b/difyPlugin/pdf/GUIDE.md
@@ -1,137 +0,0 @@
 # Dify Plugin Development Guide
 Welcome to Dify plugin development! This guide will help you get started quickly.
 ## Plugin Types
 Dify plugins extend three main capabilities:
 | Type | Description | Example |
 |------|-------------|---------|
 | **Tool** | Perform specific tasks | Google Search, Stable Diffusion |
 | **Model** | AI model integrations | OpenAI, Anthropic |
 | **Endpoint** | HTTP services | Custom APIs, integrations |
 You can create:
 - **Tool**: Tool provider with optional endpoints (e.g., Discord bot)
 - **Model**: Model provider only
 - **Extension**: Simple HTTP service
 ## Setup
 ### Requirements
 - Python 3.11+
 - Dependencies: `pip install -r requirements.txt`
 ## Development Process
 <details>
 <summary><b>1. Manifest Structure</b></summary>
 Edit `manifest.yaml` to describe your plugin:
 ```yaml
 version: 0.1.0                  # Required: Plugin version
 type: plugin                    # Required: plugin or bundle
 author: YourOrganization        # Required: Organization name
 label:                          # Required: Multi-language names
  en_US: Plugin Name
  zh_Hans: 插件名称
 created_at: 2023-01-01T00:00:00Z # Required: Creation time (RFC3339)
 icon: assets/icon.png           # Required: Icon path
 # Resources and permissions
 resource:
  memory: 268435456            # Max memory (bytes)
  permission:
    tool:
      enabled: true            # Tool permission
    model:
      enabled: true            # Model permission
      llm: true
      text_embedding: false
      # Other model types...
    # Other permissions...
 # Extensions definition
 plugins:
  tools:
    - tools/my_tool.yaml       # Tool definition files
  models:
    - models/my_model.yaml     # Model definition files
  endpoints:
    - endpoints/my_api.yaml    # Endpoint definition files
 # Runtime metadata
 meta:
  version: 0.0.1               # Manifest format version
  arch:
    - amd64
    - arm64
  runner:
    language: python
    version: "3.12"
    entrypoint: main
 ```
 **Restrictions:**
 - Cannot extend both tools and models
 - Must have at least one extension
 - Cannot extend both models and endpoints
 - Limited to one supplier per extension type
 </details>
 <details>
 <summary><b>2. Implementation Examples</b></summary>
 Study these examples to understand plugin implementation:
 - [OpenAI](https://github.com/langgenius/dify-plugin-sdks/tree/main/python/examples/openai) - Model provider
 - [Google Search](https://github.com/langgenius/dify-plugin-sdks/tree/main/python/examples/google) - Tool provider
 - [Neko](https://github.com/langgenius/dify-plugin-sdks/tree/main/python/examples/neko) - Endpoint group
 </details>
 <details>
 <summary><b>3. Testing & Debugging</b></summary>
 1. Copy `.env.example` to `.env` and configure:
   ```
   INSTALL_METHOD=remote
   REMOTE_INSTALL_URL=debug.dify.ai:5003
   REMOTE_INSTALL_KEY=your-debug-key
   ```
 2. Run your plugin: 
   ```bash
   python -m main
   ```
 3. Refresh your Dify instance to see the plugin (marked as "debugging")
 </details>
 <details>
 <summary><b>4. Publishing</b></summary>
 #### Manual Packaging
 ```bash
 dify-plugin plugin package ./YOUR_PLUGIN_DIR
 ```
 #### Automated GitHub Workflow
 Configure GitHub Actions to automate PR creation:
 1. Create a Personal Access Token for your forked repository
 2. Add it as `PLUGIN_ACTION` secret in your source repo
 3. Create `.github/workflows/plugin-publish.yml`
 When you create a release, the action will:
 - Package your plugin
 - Create a PR to your fork
 [Detailed workflow documentation](https://docs.dify.ai/plugins/publish-plugins/plugin-auto-publish-pr)
 </details>
 ## Privacy Policy
 If publishing to the Marketplace, provide a privacy policy in [PRIVACY.md](PRIVACY.md).
--- a/difyPlugin/pdf/PRIVACY.md
+++ b/difyPlugin/pdf/PRIVACY.md
@@ -1,3 +0,0 @@
 ## Privacy
 !!! Please fill in the privacy policy of the plugin.
--- a/difyPlugin/pdf/README.md
+++ b/difyPlugin/pdf/README.md
@@ -1,10 +0,0 @@
 ## pdf
 **Author:** yslg
 **Version:** 0.0.1
 **Type:** tool
 ### Description
--- a/difyPlugin/pdf/_assets/icon-dark.svg
+++ b/difyPlugin/pdf/_assets/icon-dark.svg
@@ -1,55 +0,0 @@
 <!--
  ~ Dify Marketplace Template Icon
  ~ Dify 市场模板图标
  ~ Dify マーケットプレイステンプレートアイコン
  ~
  ~ WARNING / 警告 / 警告:
  ~ 
  ~ English: This is a TEMPLATE icon from Dify Marketplace only. You MUST NOT use this default icon in any way.
  ~ Please replace it with your own custom icon before submit this plugin.
  ~ 
  ~ 中文: 这只是来自 Dify 市场的模板图标。您绝对不能以任何方式使用此默认图标。
  ~ 请在提交此插件之前将其替换为您自己的自定义图标。
  ~ 
  ~ 日本語: これは Dify マーケットプレイスのテンプレートアイコンです。このデフォルトアイコンをいかなる方法でも使用してはいけません。
  ~ このプラグインを提出する前に、独自のカスタムアイコンに置き換えてください。
  ~ 
  ~ DIFY_MARKETPLACE_TEMPLATE_ICON_DO_NOT_USE
  -->
 <svg width="40" height="40" viewBox="0 0 40 40" fill="none" xmlns="http://www.w3.org/2000/svg">
 <g clip-path="url(#clip0_15253_95095)">
 <rect width="40" height="40" fill="#0033FF"/>
 <g filter="url(#filter0_n_15253_95095)">
 <rect width="40" height="40" fill="url(#paint0_linear_15253_95095)"/>
 </g>
 <path d="M28 10C28.5523 10 29 10.4477 29 11V16C29 16.5523 28.5523 17 28 17H23V30C23 30.5523 22.5523 31 22 31H18C17.4477 31 17 30.5523 17 30V17H11.5C10.9477 17 10.5 16.5523 10.5 16V13.618C10.5 13.2393 10.714 12.893 11.0528 12.7236L16.5 10H28ZM23 12H16.9721L12.5 14.2361V15H19V29H21V15H23V12ZM27 12H25V15H27V12Z" fill="white"/>
 </g>
 <defs>
 <filter id="filter0_n_15253_95095" x="0" y="0" width="40" height="40" filterUnits="userSpaceOnUse" color-interpolation-filters="sRGB">
 <feFlood flood-opacity="0" result="BackgroundImageFix"/>
 <feBlend mode="normal" in="SourceGraphic" in2="BackgroundImageFix" result="shape"/>
 <feTurbulence type="fractalNoise" baseFrequency="2 2" stitchTiles="stitch" numOctaves="3" result="noise" seed="8033" />
 <feComponentTransfer in="noise" result="coloredNoise1">
 <feFuncR type="linear" slope="2" intercept="-0.5" />
 <feFuncG type="linear" slope="2" intercept="-0.5" />
 <feFuncB type="linear" slope="2" intercept="-0.5" />
 <feFuncA type="discrete" tableValues="1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "/>
 </feComponentTransfer>
 <feComposite operator="in" in2="shape" in="coloredNoise1" result="noise1Clipped" />
 <feComponentTransfer in="noise1Clipped" result="color1">
 <feFuncA type="table" tableValues="0 0.06" />
 </feComponentTransfer>
 <feMerge result="effect1_noise_15253_95095">
 <feMergeNode in="shape" />
 <feMergeNode in="color1" />
 </feMerge>
 </filter>
 <linearGradient id="paint0_linear_15253_95095" x1="0" y1="0" x2="40" y2="40" gradientUnits="userSpaceOnUse">
 <stop stop-color="#1443FF"/>
 <stop offset="1" stop-color="#0031F5"/>
 </linearGradient>
 <clipPath id="clip0_15253_95095">
 <rect width="40" height="40" fill="white"/>
 </clipPath>
 </defs>
 </svg>
--- a/difyPlugin/pdf/_assets/icon.svg
+++ b/difyPlugin/pdf/_assets/icon.svg
@@ -1,55 +0,0 @@
 <!--
  ~ Dify Marketplace Template Icon
  ~ Dify 市场模板图标
  ~ Dify マーケットプレイステンプレートアイコン
  ~
  ~ WARNING / 警告 / 警告:
  ~ 
  ~ English: This is a TEMPLATE icon from Dify Marketplace only. You MUST NOT use this default icon in any way.
  ~ Please replace it with your own custom icon before submit this plugin.
  ~ 
  ~ 中文: 这只是来自 Dify 市场的模板图标。您绝对不能以任何方式使用此默认图标。
  ~ 请在提交此插件之前将其替换为您自己的自定义图标。
  ~ 
  ~ 日本語: これは Dify マーケットプレイスのテンプレートアイコンです。このデフォルトアイコンをいかなる方法でも使用してはいけません。
  ~ このプラグインを提出する前に、独自のカスタムアイコンに置き換えてください。
  ~ 
  ~ DIFY_MARKETPLACE_TEMPLATE_ICON_DO_NOT_USE
  -->
 <svg width="40" height="40" viewBox="0 0 40 40" fill="none" xmlns="http://www.w3.org/2000/svg">
 <g clip-path="url(#clip0_15255_46435)">
 <rect width="40" height="40" fill="#0033FF"/>
 <g filter="url(#filter0_n_15255_46435)">
 <rect width="40" height="40" fill="url(#paint0_linear_15255_46435)"/>
 </g>
 <path d="M28 10C28.5523 10 29 10.4477 29 11V16C29 16.5523 28.5523 17 28 17H23V30C23 30.5523 22.5523 31 22 31H18C17.4477 31 17 30.5523 17 30V17H11.5C10.9477 17 10.5 16.5523 10.5 16V13.618C10.5 13.2393 10.714 12.893 11.0528 12.7236L16.5 10H28ZM23 12H16.9721L12.5 14.2361V15H19V29H21V15H23V12ZM27 12H25V15H27V12Z" fill="white"/>
 </g>
 <defs>
 <filter id="filter0_n_15255_46435" x="0" y="0" width="40" height="40" filterUnits="userSpaceOnUse" color-interpolation-filters="sRGB">
 <feFlood flood-opacity="0" result="BackgroundImageFix"/>
 <feBlend mode="normal" in="SourceGraphic" in2="BackgroundImageFix" result="shape"/>
 <feTurbulence type="fractalNoise" baseFrequency="2 2" stitchTiles="stitch" numOctaves="3" result="noise" seed="8033" />
 <feComponentTransfer in="noise" result="coloredNoise1">
 <feFuncR type="linear" slope="2" intercept="-0.5" />
 <feFuncG type="linear" slope="2" intercept="-0.5" />
 <feFuncB type="linear" slope="2" intercept="-0.5" />
 <feFuncA type="discrete" tableValues="1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "/>
 </feComponentTransfer>
 <feComposite operator="in" in2="shape" in="coloredNoise1" result="noise1Clipped" />
 <feComponentTransfer in="noise1Clipped" result="color1">
 <feFuncA type="table" tableValues="0 0.06" />
 </feComponentTransfer>
 <feMerge result="effect1_noise_15255_46435">
 <feMergeNode in="shape" />
 <feMergeNode in="color1" />
 </feMerge>
 </filter>
 <linearGradient id="paint0_linear_15255_46435" x1="0" y1="0" x2="40" y2="40" gradientUnits="userSpaceOnUse">
 <stop stop-color="#1F4CFF"/>
 <stop offset="1" stop-color="#0033FF"/>
 </linearGradient>
 <clipPath id="clip0_15255_46435">
 <rect width="40" height="40" fill="white"/>
 </clipPath>
 </defs>
 </svg>
--- a/difyPlugin/pdf/main.py
+++ b/difyPlugin/pdf/main.py
@@ -1,6 +0,0 @@
 from dify_plugin import Plugin, DifyPluginEnv
 plugin = Plugin(DifyPluginEnv(MAX_REQUEST_TIMEOUT=120))
 if __name__ == '__main__':
    plugin.run()
--- a/difyPlugin/pdf/manifest.yaml
+++ b/difyPlugin/pdf/manifest.yaml
@@ -1,40 +0,0 @@
 version: 0.0.1
 type: plugin
 author: yslg
 name: pdf
 label:
  en_US: pdf
  ja_JP: pdf
  zh_Hans: pdf
  pt_BR: pdf
 description:
  en_US: pdfTools
  ja_JP: pdfTools
  zh_Hans: pdfTools
  pt_BR: pdfTools
 icon: icon.svg
 icon_dark: icon-dark.svg
 resource:
  memory: 268435456
  permission:
    tool:
      enabled: true
    model:
      enabled: true
      llm: true
 plugins:
  tools:
    - provider/pdf.yaml
 meta:
  version: 0.0.1
  arch:
    - amd64
    - arm64
  runner:
    language: python
    version: "3.12"
    entrypoint: main
  minimum_dify_version: null
 created_at: 2026-03-02T13:21:03.2806864+08:00
 privacy: PRIVACY.md
 verified: false
--- a/difyPlugin/pdf/plugin.json
+++ b/difyPlugin/pdf/plugin.json
@@ -1,64 +0,0 @@
 {
  "name": "pdf-plugin",
  "version": "1.0.0",
  "description": "PDF plugin for analyzing table of contents and extracting text",
  "author": "System",
  "type": "tool",
  "main": "main.py",
  "requirements": "requirements.txt",
  "icon": "https://neeko-copilot.bytedance.net/api/text2image?prompt=PDF%20document%20icon&size=square",
  "settings": [
    {
      "key": "debug",
      "type": "boolean",
      "default": false,
      "description": "Enable debug mode"
    }
  ],
  "functions": [
    {
      "name": "analyze_toc",
      "description": "Analyze PDF and find table of contents",
      "parameters": {
        "type": "object",
        "properties": {
          "file": {
            "type": "file",
            "description": "PDF file to analyze",
            "fileTypes": ["pdf"]
          }
        },
        "required": ["file"]
      }
    },
    {
      "name": "extract_text",
      "description": "Extract text from specified page range",
      "parameters": {
        "type": "object",
        "properties": {
          "file": {
            "type": "file",
            "description": "PDF file to extract text from",
            "fileTypes": ["pdf"]
          },
          "page_range": {
            "type": "object",
            "properties": {
              "start": {
                "type": "integer",
                "default": 0,
                "description": "Start page index"
              },
              "end": {
                "type": "integer",
                "description": "End page index"
              }
            }
          }
        },
        "required": ["file"]
      }
    }
  ]
 }
--- a/difyPlugin/pdf/provider/pdf.py
+++ b/difyPlugin/pdf/provider/pdf.py
@@ -1,53 +0,0 @@
 from typing import Any
 from dify_plugin import ToolProvider
 from dify_plugin.errors.tool import ToolProviderCredentialValidationError
 class PdfProvider(ToolProvider):
    def _validate_credentials(self, credentials: dict[str, Any]) -> None:
        try:
            """
            IMPLEMENT YOUR VALIDATION HERE
            """
        except Exception as e:
            raise ToolProviderCredentialValidationError(str(e))
    #########################################################################################
    # If OAuth is supported, uncomment the following functions.
    # Warning: please make sure that the sdk version is 0.4.2 or higher.
    #########################################################################################
    # def _oauth_get_authorization_url(self, redirect_uri: str, system_credentials: Mapping[str, Any]) -> str:
    #     """
    #     Generate the authorization URL for pdf OAuth.
    #     """
    #     try:
    #         """
    #         IMPLEMENT YOUR AUTHORIZATION URL GENERATION HERE
    #         """
    #     except Exception as e:
    #         raise ToolProviderOAuthError(str(e))
    #     return ""
    # def _oauth_get_credentials(
    #     self, redirect_uri: str, system_credentials: Mapping[str, Any], request: Request
    # ) -> Mapping[str, Any]:
    #     """
    #     Exchange code for access_token.
    #     """
    #     try:
    #         """
    #         IMPLEMENT YOUR CREDENTIALS EXCHANGE HERE
    #         """
    #     except Exception as e:
    #         raise ToolProviderOAuthError(str(e))
    #     return dict()
    # def _oauth_refresh_credentials(
    #     self, redirect_uri: str, system_credentials: Mapping[str, Any], credentials: Mapping[str, Any]
    # ) -> OAuthCredentials:
    #     """
    #     Refresh the credentials
    #     """
    #     return OAuthCredentials(credentials=credentials, expires_at=-1)
--- a/difyPlugin/pdf/provider/pdf.yaml
+++ b/difyPlugin/pdf/provider/pdf.yaml
@@ -1,21 +0,0 @@
 identity:
  author: "yslg"
  name: "pdf"
  label:
    en_US: "pdf"
    zh_Hans: "pdf"
    pt_BR: "pdf"
    ja_JP: "pdf"
  description:
    en_US: "pdfTools"
    zh_Hans: "pdfTools"
    pt_BR: "pdfTools"
    ja_JP: "pdfTools"
  icon: "icon.svg"
 tools:
  - tools/pdf_toc.yaml
  - tools/pdf_to_markdown.yaml
 extra:
  python:
    source: provider/pdf.py
--- a/difyPlugin/pdf/requirements.txt
+++ b/difyPlugin/pdf/requirements.txt
@@ -1,2 +0,0 @@
 dify_plugin>=0.4.0,<0.7.0
 pymupdf>=1.27.1
--- a/difyPlugin/pdf/tools/pdf_to_markdown.py
+++ b/difyPlugin/pdf/tools/pdf_to_markdown.py
@@ -1,234 +0,0 @@
 import json
 import re
 from collections.abc import Generator
 from typing import Any
 import fitz
 from dify_plugin import Tool
 from dify_plugin.entities.tool import ToolInvokeMessage
 class PdfToMarkdownTool(Tool):
    """Convert PDF to Markdown using an external catalog array."""
    def _invoke(self, tool_parameters: dict[str, Any]) -> Generator[ToolInvokeMessage]:
        file = tool_parameters.get("file")
        catalog_text = (tool_parameters.get("catalog") or "").strip()
        if not file:
            yield self.create_text_message("Error: file is required")
            return
        if not catalog_text:
            yield self.create_text_message("Error: catalog is required")
            return
        catalog = self._parse_catalog(catalog_text)
        if not catalog:
            yield self.create_text_message("Error: catalog must be a JSON array with title and page indexes")
            return
        doc = fitz.open(stream=file.blob, filetype="pdf")
        try:
            num_pages = len(doc)
            hf_texts = self._detect_headers_footers(doc, num_pages)
            page_mds = [self._page_to_markdown(doc[index], hf_texts) for index in range(num_pages)]
            final_md = self._assemble_by_catalog(catalog, page_mds, num_pages)
            yield self.create_text_message(final_md)
            yield self.create_blob_message(
                blob=final_md.encode("utf-8"),
                meta={"mime_type": "text/markdown"},
            )
        finally:
            doc.close()
    def _parse_catalog(self, catalog_text: str) -> list[dict[str, Any]]:
        try:
            raw = json.loads(catalog_text)
        except Exception:
            return []
        if not isinstance(raw, list):
            return []
        result: list[dict[str, Any]] = []
        for item in raw:
            if not isinstance(item, dict):
                continue
            title = str(item.get("title") or "").strip() or "Untitled"
            start_index = self._to_int(item.get("page_start_index"), None)
            end_index = self._to_int(item.get("page_end_index"), start_index)
            if start_index is None:
                start = self._to_int(item.get("start"), None)
                end = self._to_int(item.get("end"), start)
                if start is None:
                    continue
                start_index = max(0, start - 1)
                end_index = max(start_index, (end if end is not None else start) - 1)
            if end_index is None:
                end_index = start_index
            result.append(
                {
                    "title": title,
                    "page_start_index": max(0, start_index),
                    "page_end_index": max(start_index, end_index),
                }
            )
        return result
    def _detect_headers_footers(self, doc: fitz.Document, num_pages: int) -> set[str]:
        margin_ratio = 0.08
        sample_count = min(num_pages, 30)
        text_counts: dict[str, int] = {}
        for idx in range(sample_count):
            page = doc[idx]
            page_height = page.rect.height
            top_limit = page_height * margin_ratio
            bottom_limit = page_height * (1 - margin_ratio)
            try:
                blocks = page.get_text("blocks", sort=True) or []
            except Exception:
                continue
            seen: set[str] = set()
            for block in blocks:
                if len(block) < 7 or block[6] != 0:
                    continue
                y0, y1 = block[1], block[3]
                text = (block[4] or "").strip()
                if not text or len(text) < 2 or text in seen:
                    continue
                if y1 <= top_limit or y0 >= bottom_limit:
                    seen.add(text)
                    text_counts[text] = text_counts.get(text, 0) + 1
        threshold = max(3, sample_count * 0.35)
        return {text for text, count in text_counts.items() if count >= threshold}
    def _page_to_markdown(self, page: fitz.Page, hf_texts: set[str]) -> str:
        parts: list[str] = []
        page_height = page.rect.height
        top_margin = page_height * 0.06
        bottom_margin = page_height * 0.94
        table_rects: list[fitz.Rect] = []
        table_mds: list[str] = []
        try:
            find_tables = getattr(page, "find_tables", None)
            tables = []
            if callable(find_tables):
                table_finder = find_tables()
                tables = getattr(table_finder, "tables", []) or []
            for table in tables[:5]:
                try:
                    table_rects.append(fitz.Rect(table.bbox))
                except Exception:
                    pass
                cells = table.extract() or []
                if len(cells) < 2:
                    continue
                if hf_texts and len(cells) <= 3:
                    flat = " ".join(str(cell or "") for row in cells for cell in row)
                    if any(hf in flat for hf in hf_texts):
                        continue
                md_table = self._cells_to_md_table(cells)
                if md_table:
                    table_mds.append(md_table)
        except Exception:
            pass
        try:
            blocks = page.get_text("blocks", sort=True) or []
        except Exception:
            blocks = []
        for block in blocks:
            if len(block) < 7 or block[6] != 0:
                continue
            x0, y0, x1, y1 = block[:4]
            text = (block[4] or "").strip()
            if not text:
                continue
            block_rect = fitz.Rect(x0, y0, x1, y1)
            if any(self._rects_overlap(block_rect, table_rect) for table_rect in table_rects):
                continue
            if hf_texts and (y1 <= top_margin or y0 >= bottom_margin):
                if any(hf in text for hf in hf_texts):
                    continue
            if re.fullmatch(r"\s*\d{1,4}\s*", text):
                continue
            parts.append(text)
        parts.extend(table_mds)
        return "\n\n".join(parts)
    def _assemble_by_catalog(self, catalog: list[dict[str, Any]], page_mds: list[str], num_pages: int) -> str:
        parts: list[str] = []
        used_pages: set[int] = set()
        for item in catalog:
            start = max(0, min(int(item["page_start_index"]), num_pages - 1))
            end = max(start, min(int(item["page_end_index"]), num_pages - 1))
            chapter_parts = [f"# {item['title']}\n"]
            for idx in range(start, end + 1):
                if idx < len(page_mds) and page_mds[idx].strip() and idx not in used_pages:
                    chapter_parts.append(page_mds[idx])
                    used_pages.add(idx)
            if len(chapter_parts) > 1:
                parts.append("\n\n".join(chapter_parts))
        if parts:
            return "\n\n---\n\n".join(parts)
        return "\n\n---\n\n".join(m for m in page_mds if m.strip())
    @staticmethod
    def _rects_overlap(block_rect: fitz.Rect, table_rect: fitz.Rect) -> bool:
        inter = block_rect & table_rect
        if inter.is_empty:
            return False
        block_area = block_rect.width * block_rect.height
        if block_area <= 0:
            return False
        return (inter.width * inter.height) / block_area >= 0.3
    @staticmethod
    def _cells_to_md_table(cells: list) -> str:
        if not cells:
            return ""
        header = cells[0]
        ncols = len(header)
        if ncols == 0:
            return ""
        def clean(value: Any) -> str:
            return str(value or "").replace("|", "\\|").replace("\n", " ").strip()
        lines = [
            "| " + " | ".join(clean(cell) for cell in header) + " |",
            "| " + " | ".join("---" for _ in range(ncols)) + " |",
        ]
        for row in cells[1:]:
            padded = list(row) + [""] * max(0, ncols - len(row))
            lines.append("| " + " | ".join(clean(cell) for cell in padded[:ncols]) + " |")
        return "\n".join(lines)
    @staticmethod
    def _to_int(value: Any, default: int | None) -> int | None:
        try:
            if value is None or value == "":
                return default
            return int(value)
        except Exception:
            return default
--- a/difyPlugin/pdf/tools/pdf_to_markdown.yaml
+++ b/difyPlugin/pdf/tools/pdf_to_markdown.yaml
@@ -1,51 +0,0 @@
 identity:
  name: "pdf_to_markdown"
  author: "yslg"
  label:
    en_US: "PDF to Markdown"
    zh_Hans: "PDF to Markdown"
    pt_BR: "PDF para Markdown"
    ja_JP: "PDF to Markdown"
 description:
  human:
    en_US: "Convert PDF to Markdown using a catalog array. Images and graphics are ignored."
    zh_Hans: "Convert PDF to Markdown using a catalog array. Images and graphics are ignored."
    pt_BR: "Convert PDF to Markdown using a catalog array. Images and graphics are ignored."
    ja_JP: "Convert PDF to Markdown using a catalog array. Images and graphics are ignored."
  llm: "Convert a PDF file into Markdown using a catalog JSON array. Ignore images and graphics."
 parameters:
  - name: file
    type: file
    required: true
    label:
      en_US: PDF File
      zh_Hans: PDF File
      pt_BR: PDF File
      ja_JP: PDF File
    human_description:
      en_US: "PDF file to convert"
      zh_Hans: "PDF file to convert"
      pt_BR: "PDF file to convert"
      ja_JP: "PDF file to convert"
    llm_description: "PDF file to convert to Markdown"
    form: llm
    fileTypes:
      - "pdf"
  - name: catalog
    type: string
    required: true
    label:
      en_US: Catalog JSON
      zh_Hans: Catalog JSON
      pt_BR: Catalog JSON
      ja_JP: Catalog JSON
    human_description:
      en_US: "Catalog JSON array like [{title,start,end,page_start_index,page_end_index}]"
      zh_Hans: "Catalog JSON array like [{title,start,end,page_start_index,page_end_index}]"
      pt_BR: "Catalog JSON array like [{title,start,end,page_start_index,page_end_index}]"
      ja_JP: "Catalog JSON array like [{title,start,end,page_start_index,page_end_index}]"
    llm_description: "Catalog JSON array returned by pdf_toc"
    form: llm
 extra:
  python:
    source: tools/pdf_to_markdown.py
--- a/difyPlugin/pdf/tools/pdf_toc.py
+++ b/difyPlugin/pdf/tools/pdf_toc.py
@@ -1,312 +0,0 @@
 import json
 import re
 from collections import OrderedDict
 from collections.abc import Generator
 from typing import Any
 import fitz
 from dify_plugin import Tool
 from dify_plugin.entities.model.llm import LLMModelConfig
 from dify_plugin.entities.model.message import SystemPromptMessage, UserPromptMessage
 from dify_plugin.entities.tool import ToolInvokeMessage
 _TOC_SYSTEM_PROMPT = """你是专业的PDF目录解析助手。请从以下PDF文本中提取文档的目录/章节结构。
 要求：
 1. 识别所有一级和二级标题及其对应的页码
 2. 只返回纯JSON数组，不要markdown代码块，不要任何解释
 3. 格式: [{"title": "章节标题", "page": 页码数字}]
 4. 页码必须是文档中标注的实际页码数字
 5. 如果无法识别目录，返回空数组 []"""
 class PdfTocTool(Tool):
    _TOC_PATTERNS = [
        r"目录",
        r"目\s*录",
        r"目\u3000录",
        r"Table of Contents",
        r"Contents",
        r"目次",
    ]
    def _invoke(self, tool_parameters: dict[str, Any]) -> Generator[ToolInvokeMessage]:
        file = tool_parameters.get("file")
        if not file:
            yield self.create_text_message("Error: file is required")
            return
        model_config = tool_parameters.get("model")
        doc = fitz.open(stream=file.blob, filetype="pdf")
        try:
            num_pages = len(doc)
            # 1) 优先从PDF元数据提取目录
            catalog = self._catalog_from_metadata(doc.get_toc(), num_pages)
            # 2) 元数据无目录时，使用LLM解析
            if not catalog and model_config:
                catalog = self._extract_toc_with_llm(doc, num_pages, model_config)
            # 3) 无LLM配置时回退到正则解析
            if not catalog:
                toc_start, toc_end = self._find_toc_pages(doc, num_pages)
                if toc_start is not None and toc_end is not None:
                    toc_text = "\n".join(
                        doc[index].get_text() or "" for index in range(toc_start, toc_end + 1)
                    )
                    printed_catalog = self._parse_toc_lines(toc_text)
                    catalog = self._attach_page_indexes(printed_catalog, toc_end, num_pages)
            if not catalog:
                catalog = []
            yield self.create_text_message(json.dumps(catalog, ensure_ascii=False))
        finally:
            doc.close()
    def _extract_toc_with_llm(
        self, doc: fitz.Document, num_pages: int, model_config: dict[str, Any]
    ) -> list[dict[str, int | str]]:
        # 先尝试定位目录页
        toc_start, toc_end = self._find_toc_pages(doc, num_pages)
        if toc_start is not None and toc_end is not None:
            # 有目录页，提取目录页文本
            toc_text = "\n".join(
                doc[index].get_text() or "" for index in range(toc_start, toc_end + 1)
            )
            content_offset = toc_end
        else:
            # 无目录页，提取前15页文本让LLM识别章节结构
            sample = min(num_pages, 15)
            toc_text = "\n\n--- 第{}页 ---\n".join(
                [""] + [doc[i].get_text() or "" for i in range(sample)]
            )
            toc_text = toc_text.strip()
            if not toc_text:
                return []
            content_offset = 0
        # 截断过长文本
        if len(toc_text) > 15000:
            toc_text = toc_text[:15000] + "\n...[截断]"
        try:
            response = self.session.model.llm.invoke(
                model_config=LLMModelConfig(**model_config),
                prompt_messages=[
                    SystemPromptMessage(content=_TOC_SYSTEM_PROMPT),
                    UserPromptMessage(content=toc_text),
                ],
                stream=False,
            )
            llm_text = self._get_response_text(response)
            if not llm_text:
                return []
            raw_catalog = self._parse_llm_json(llm_text)
            if not raw_catalog:
                return []
            # 转换LLM返回的简单格式为完整catalog
            return self._build_catalog_from_llm(raw_catalog, content_offset, num_pages)
        except Exception:
            return []
    def _build_catalog_from_llm(
        self, raw: list[dict], content_offset: int, num_pages: int
    ) -> list[dict[str, int | str]]:
        entries: list[tuple[str, int]] = []
        for item in raw:
            title = str(item.get("title") or "").strip()
            page = self._to_int(item.get("page"), None)
            if not title or page is None:
                continue
            entries.append((title, page))
        if not entries:
            return []
        # 计算偏移量：第一个条目的页码与实际内容起始页的差值
        first_printed_page = entries[0][1]
        offset = (content_offset + 1) - first_printed_page if content_offset > 0 else 0
        result: list[dict[str, int | str]] = []
        for i, (title, page) in enumerate(entries):
            next_page = entries[i + 1][1] if i + 1 < len(entries) else page
            page_start_index = max(0, min(page + offset - 1, num_pages - 1))
            page_end_index = max(page_start_index, min(next_page + offset - 2, num_pages - 1))
            if i == len(entries) - 1:
                page_end_index = num_pages - 1
            result.append({
                "title": title,
                "start": page,
                "end": max(page, next_page - 1) if i + 1 < len(entries) else page,
                "page_start_index": page_start_index,
                "page_end_index": page_end_index,
            })
        return result
    @staticmethod
    def _get_response_text(response: Any) -> str:
        if not hasattr(response, "message") or not response.message:
            return ""
        content = response.message.content
        if isinstance(content, str):
            text = content
        elif isinstance(content, list):
            text = "".join(
                item.data if hasattr(item, "data") else str(item) for item in content
            )
        else:
            text = str(content)
        # 清理思考标签
        text = re.sub(r"<think>[\s\S]*?</think>", "", text, flags=re.IGNORECASE)
        text = re.sub(r"<\|[^>]+\|>", "", text)
        return text.strip()
    @staticmethod
    def _parse_llm_json(text: str) -> list[dict]:
        # 尝试提取JSON代码块
        code_match = re.search(r"```(?:json)?\s*([\s\S]*?)```", text)
        if code_match:
            text = code_match.group(1).strip()
        # 尝试找到JSON数组
        bracket_match = re.search(r"\[[\s\S]*\]", text)
        if bracket_match:
            text = bracket_match.group(0)
        try:
            result = json.loads(text)
            if isinstance(result, list):
                return result
        except Exception:
            pass
        return []
    def _catalog_from_metadata(self, toc: list, num_pages: int) -> list[dict[str, int | str]]:
        top = [(title, max(0, page - 1)) for level, title, page in toc if level <= 2 and page >= 1]
        if not top:
            return []
        result: list[dict[str, int | str]] = []
        for index, (title, start_index) in enumerate(top):
            end_index = top[index + 1][1] - 1 if index + 1 < len(top) else num_pages - 1
            result.append({
                "title": title,
                "start": start_index + 1,
                "end": max(start_index, end_index) + 1,
                "page_start_index": start_index,
                "page_end_index": max(start_index, end_index),
            })
        return result
    def _find_toc_pages(self, doc: fitz.Document, num_pages: int) -> tuple[int | None, int | None]:
        toc_start = None
        toc_end = None
        for page_number in range(min(num_pages, 30)):
            text = doc[page_number].get_text() or ""
            if any(re.search(pattern, text, re.IGNORECASE) for pattern in self._TOC_PATTERNS):
                if toc_start is None:
                    toc_start = page_number
                toc_end = page_number
            elif toc_start is not None:
                break
        return toc_start, toc_end
    def _parse_toc_lines(self, text: str) -> list[dict[str, int | str]]:
        marker = re.search(
            r"^(List\s+of\s+Figures|List\s+of\s+Tables|图目录|表目录)",
            text,
            re.IGNORECASE | re.MULTILINE,
        )
        if marker:
            text = text[: marker.start()]
        pattern = re.compile(r"^\s*(?P<title>.+?)\s*(?:\.{2,}|\s)\s*(?P<page>\d{1,5})\s*$")
        entries: list[tuple[str, int]] = []
        for raw in text.splitlines():
            line = raw.strip()
            if not line or len(line) < 3 or re.fullmatch(r"\d+", line):
                continue
            match = pattern.match(line)
            if not match:
                continue
            title = re.sub(r"\s+", " ", match.group("title")).strip("-_:：")
            page = self._to_int(match.group("page"), None)
            if not title or page is None or len(title) <= 1:
                continue
            if title.lower() in {"page", "pages", "目录", "contents"}:
                continue
            entries.append((title, page))
        if not entries:
            return []
        dedup: OrderedDict[str, int] = OrderedDict()
        for title, page in entries:
            dedup.setdefault(title, page)
        titles = list(dedup.keys())
        pages = [dedup[title] for title in titles]
        result: list[dict[str, int | str]] = []
        for index, title in enumerate(titles):
            start = pages[index]
            end = max(start, pages[index + 1] - 1) if index + 1 < len(pages) else start
            result.append({"title": title, "start": start, "end": end})
        return result
    def _attach_page_indexes(
        self, catalog: list[dict[str, int | str]], toc_end: int, num_pages: int
    ) -> list[dict[str, int | str]]:
        if not catalog:
            return []
        first_page = None
        for item in catalog:
            start = self._to_int(item.get("start"), None)
            if start is not None and (first_page is None or start < first_page):
                first_page = start
        if first_page is None:
            return []
        offset = (toc_end + 1) - first_page
        result: list[dict[str, int | str]] = []
        for item in catalog:
            start = self._to_int(item.get("start"), None)
            end = self._to_int(item.get("end"), start)
            if start is None:
                continue
            if end is None:
                end = start
            page_start_index = max(0, min(start + offset, num_pages - 1))
            page_end_index = max(page_start_index, min(end + offset, num_pages - 1))
            result.append({
                "title": str(item.get("title") or "Untitled"),
                "start": start,
                "end": max(start, end),
                "page_start_index": page_start_index,
                "page_end_index": page_end_index,
            })
        return result
    @staticmethod
    def _to_int(value: Any, default: int | None) -> int | None:
        try:
            if value is None or value == "":
                return default
            return int(value)
        except Exception:
            return default
--- a/difyPlugin/pdf/tools/pdf_toc.yaml
+++ b/difyPlugin/pdf/tools/pdf_toc.yaml
@@ -1,51 +0,0 @@
 identity:
  name: "pdf_toc"
  author: "yslg"
  label:
    en_US: "PDF TOC"
    zh_Hans: "PDF 目录提取"
    pt_BR: "PDF TOC"
    ja_JP: "PDF TOC"
 description:
  human:
    en_US: "Extract the catalog array from a PDF file using metadata or LLM."
    zh_Hans: "从PDF文件中提取目录数组，优先使用元数据，回退使用LLM解析。"
    pt_BR: "Extrair o array de catálogo de um arquivo PDF."
    ja_JP: "PDFファイルからカタログ配列を抽出する。"
  llm: "Extract a catalog array from a PDF file. Returns JSON text like [{title,start,end,page_start_index,page_end_index}]."
 parameters:
  - name: file
    type: file
    required: true
    label:
      en_US: PDF File
      zh_Hans: PDF 文件
      pt_BR: PDF File
      ja_JP: PDF File
    human_description:
      en_US: "PDF file to inspect"
      zh_Hans: "要解析的PDF文件"
      pt_BR: "PDF file to inspect"
      ja_JP: "PDF file to inspect"
    llm_description: "PDF file to extract catalog from"
    form: llm
    fileTypes:
      - "pdf"
  - name: model
    type: model-selector
    scope: llm
    required: true
    label:
      en_US: LLM Model
      zh_Hans: LLM 模型
      pt_BR: Modelo LLM
      ja_JP: LLMモデル
    human_description:
      en_US: "LLM model used for parsing TOC when metadata is unavailable"
      zh_Hans: "当元数据不可用时，用于解析目录的LLM模型"
      pt_BR: "Modelo LLM para análise de TOC"
      ja_JP: "メタデータが利用できない場合のTOC解析用LLMモデル"
    form: form
 extra:
  python:
    source: tools/pdf_toc.py
--- a/difyPlugin/数据清洗-大文件处理.yml.bak
+++ b/difyPlugin/数据清洗-大文件处理.yml.bak
--- a/difyPlugin/需求文档.md
+++ b/difyPlugin/需求文档.md
@@ -1,122 +0,0 @@
 # Dify 插件服务需求文档
 ## 1. 项目概述
 开发一个基于 FastAPI 框架的 Dify 插件服务，实现与 Dify 平台的集成，支持多种插件的部署和管理，提供各种功能扩展。
 ## 2. 技术栈
 - **框架**：FastAPI
 - **语言**：Python 3.9+
 - **依赖管理**：Poetry 或 Pip
 - **部署方式**：Docker 容器化
 ## 3. 项目架构
 ### 3.1 架构设计
 - **插件管理系统**：统一管理多个 Dify 插件
 - **插件加载机制**：支持动态加载和热更新插件
 - **插件隔离**：每个插件运行在独立的环境中
 - **API 网关**：统一的 API 入口，路由到对应插件
 ### 3.2 目录结构
 ```
 difyPlugin/
 ├── main.py              # 应用入口
 ├── requirements.txt     # 依赖管理
 ├── .env                 # 环境配置
 ├── app/
 │   ├── api/             # API 路由
 │   ├── core/            # 核心配置
 │   ├── plugins/         # 插件目录
 │   │   ├── plugin1/     # 插件1
 │   │   ├── plugin2/     # 插件2
 │   │   └── __init__.py  # 插件加载器
 │   └── services/        # 公共服务
 └── tests/               # 测试目录
 ```
 ### 3.3 插件规范
 - **插件结构**：每个插件包含独立的配置、逻辑和 API
 - **插件接口**：统一的插件接口规范
 - **插件注册**：自动发现和注册插件
 - **插件生命周期**：支持插件的启动、停止和重启
 ## 4. 核心功能
 ### 4.1 基础功能
 - **健康检查**：提供服务状态检查接口
 - **版本管理**：支持插件版本控制
 - **认证机制**：实现与 Dify 的安全认证
 - **插件管理**：支持插件的注册、启动、停止和卸载
 ### 4.2 业务功能
 - **数据处理**：支持各种数据格式的转换和处理
 - **外部 API 集成**：对接第三方服务的 API
 - **自定义逻辑**：支持用户自定义业务逻辑
 - **事件处理**：响应 Dify 平台的事件触发
 ## 5. 接口设计
 ### 5.1 主要接口
 - `GET /health`：健康检查
 - `GET /api/v1/plugins`：获取插件列表
 - `GET /api/v1/plugins/{plugin_id}`：获取插件详情
 - `POST /api/v1/plugins/{plugin_id}/execute`：执行插件功能
 - `GET /api/v1/plugins/{plugin_id}/metadata`：获取插件元数据
 - `POST /api/v1/plugins/{plugin_id}/start`：启动插件
 - `POST /api/v1/plugins/{plugin_id}/stop`：停止插件
 ### 5.2 请求/响应格式
 - **请求格式**：JSON
 - **响应格式**：JSON，包含状态码和数据
 ## 6. 部署要求
 - **环境变量**：支持通过环境变量配置服务参数
 - **日志管理**：集成结构化日志
 - **监控指标**：提供 Prometheus 指标接口
 - **错误处理**：完善的错误处理和异常捕获
 - **插件隔离**：支持插件的独立部署和隔离
 ## 7. 集成方式
 - **Dify 插件注册**：按照 Dify 插件规范注册
 - **Webhook 配置**：支持 Dify 平台的 Webhook 回调
 - **事件订阅**：订阅 Dify 平台的事件
 - **插件发现**：自动发现和注册新插件
 ## 8. 开发计划
 ### 8.1 阶段一：项目初始化
 - 创建 FastAPI 项目结构
 - 配置依赖管理
 - 实现插件管理系统
 ### 8.2 阶段二：核心功能开发
 - 实现插件加载机制
 - 开发插件接口规范
 - 实现数据处理功能
 - 集成外部 API
 ### 8.3 阶段三：测试与部署
 - 编写单元测试
 - 集成测试
 - 容器化部署
 - 插件示例开发
 ## 9. 技术要求
 - **代码质量**：遵循 PEP 8 编码规范
 - **文档**：完善的 API 文档
 - **性能**：优化响应速度和资源占用
 - **安全**：实现安全的认证和授权机制
 - **可扩展性**：支持插件的动态添加和移除
 ## 10. 交付物
 - **源代码**：完整的项目代码
 - **部署文档**：详细的部署步骤
 - **API 文档**：自动生成的 API 文档
 - **测试报告**：测试结果和覆盖率报告
 - **插件开发指南**：插件开发和注册指南
--- a/docs/AI训练资料/2.pdf
+++ b/docs/AI训练资料/2.pdf
		`@@ -1,3 +0,0 @@`
			`## Privacy`

			`!!! Please fill in the privacy policy of the plugin.`
		`@@ -1,2 +0,0 @@`
			`dify_plugin>=0.4.0,<0.7.0`
			`pymupdf>=1.27.1`