更新
This commit is contained in:
1
.gitignore
vendored
1
.gitignore
vendored
@@ -7,3 +7,4 @@
|
|||||||
**/*.difypkg
|
**/*.difypkg
|
||||||
urbanLifeServ/*
|
urbanLifeServ/*
|
||||||
*/.data
|
*/.data
|
||||||
|
docs
|
||||||
Submodule ai-management-dify updated: 9fffb6e421...0de13a3495
Submodule ai-management-platform updated: 6bbe3e4181...085ef040ae
@@ -1,146 +0,0 @@
|
|||||||
> ## Documentation Index
|
|
||||||
> Fetch the complete documentation index at: https://docs.dify.ai/llms.txt
|
|
||||||
> Use this file to discover all available pages before exploring further.
|
|
||||||
|
|
||||||
# CLI
|
|
||||||
|
|
||||||
> Dify 插件开发命令行界面
|
|
||||||
|
|
||||||
<Note> ⚠️ 本文档由 AI 自动翻译。如有任何不准确之处,请参考[英文原版](/en/develop-plugin/getting-started/cli)。</Note>
|
|
||||||
|
|
||||||
使用命令行界面(CLI)设置和打包你的 Dify 插件。CLI 提供了一种简化的方式来管理你的插件开发工作流,从初始化到打包。
|
|
||||||
|
|
||||||
本指南将指导你如何使用 CLI 进行 Dify 插件开发。
|
|
||||||
|
|
||||||
## 前提条件
|
|
||||||
|
|
||||||
在开始之前,请确保已安装以下内容:
|
|
||||||
|
|
||||||
* Python 版本 ≥ 3.12
|
|
||||||
* Dify CLI
|
|
||||||
* Homebrew(适用于 Mac 用户)
|
|
||||||
|
|
||||||
## 创建 Dify 插件项目
|
|
||||||
|
|
||||||
<Tabs>
|
|
||||||
<Tab title="Mac">
|
|
||||||
```bash theme={null}
|
|
||||||
brew tap langgenius/dify
|
|
||||||
brew install dify
|
|
||||||
```
|
|
||||||
</Tab>
|
|
||||||
|
|
||||||
<Tab title="Linux">
|
|
||||||
从 [Dify GitHub 发布页面](https://github.com/langgenius/dify-plugin-daemon/releases) 获取最新的 Dify CLI
|
|
||||||
|
|
||||||
```bash theme={null}
|
|
||||||
# Download dify-plugin-darwin-arm64
|
|
||||||
chmod +x dify-plugin-darwin-arm64
|
|
||||||
mv dify-plugin-darwin-arm64 dify
|
|
||||||
sudo mv dify /usr/local/bin/
|
|
||||||
```
|
|
||||||
</Tab>
|
|
||||||
</Tabs>
|
|
||||||
|
|
||||||
现在你已成功安装 Dify CLI。你可以通过运行以下命令来验证安装:
|
|
||||||
|
|
||||||
```bash theme={null}
|
|
||||||
dify version
|
|
||||||
```
|
|
||||||
|
|
||||||
你可以使用以下命令创建一个新的 Dify 插件项目:
|
|
||||||
|
|
||||||
```bash theme={null}
|
|
||||||
dify plugin init
|
|
||||||
```
|
|
||||||
|
|
||||||
根据提示填写必填字段:
|
|
||||||
|
|
||||||
```bash theme={null}
|
|
||||||
Edit profile of the plugin
|
|
||||||
Plugin name (press Enter to next step): hello-world
|
|
||||||
Author (press Enter to next step): langgenius
|
|
||||||
Description (press Enter to next step): hello world example
|
|
||||||
Repository URL (Optional) (press Enter to next step): Repository URL (Optional)
|
|
||||||
Enable multilingual README: [✔] English is required by default
|
|
||||||
|
|
||||||
Languages to generate:
|
|
||||||
English: [✔] (required)
|
|
||||||
→ 简体中文 (Simplified Chinese): [✔]
|
|
||||||
日本語 (Japanese): [✘]
|
|
||||||
Português (Portuguese - Brazil): [✘]
|
|
||||||
|
|
||||||
Controls:
|
|
||||||
↑/↓ Navigate • Space/Tab Toggle selection • Enter Next step
|
|
||||||
```
|
|
||||||
|
|
||||||
选择 `python` 并按 Enter 继续使用 Python 插件模板。
|
|
||||||
|
|
||||||
```bash theme={null}
|
|
||||||
Select the type of plugin you want to create, and press `Enter` to continue
|
|
||||||
Before starting, here's some basic knowledge about Plugin types in Dify:
|
|
||||||
|
|
||||||
- Tool: Tool Providers like Google Search, Stable Diffusion, etc. Used to perform specific tasks.
|
|
||||||
- Model: Model Providers like OpenAI, Anthropic, etc. Use their models to enhance AI capabilities.
|
|
||||||
- Endpoint: Similar to Service API in Dify and Ingress in Kubernetes. Extend HTTP services as endpoints with custom logi
|
|
||||||
- Agent Strategy: Implement your own agent strategies like Function Calling, ReAct, ToT, CoT, etc.
|
|
||||||
|
|
||||||
Based on the ability you want to extend, Plugins are divided into four types: Tool, Model, Extension, and Agent Strategy
|
|
||||||
|
|
||||||
- Tool: A tool provider that can also implement endpoints. For example, building a Discord Bot requires both Sending and
|
|
||||||
- Model: Strictly for model providers, no other extensions allowed.
|
|
||||||
- Extension: For simple HTTP services that extend functionality.
|
|
||||||
- Agent Strategy: Implement custom agent logic with a focused approach.
|
|
||||||
|
|
||||||
We've provided templates to help you get started. Choose one of the options below:
|
|
||||||
-> tool
|
|
||||||
agent-strategy
|
|
||||||
llm
|
|
||||||
text-embedding
|
|
||||||
rerank
|
|
||||||
tts
|
|
||||||
speech2text
|
|
||||||
moderation
|
|
||||||
extension
|
|
||||||
```
|
|
||||||
|
|
||||||
输入默认的 dify 版本,留空则使用最新版本:
|
|
||||||
|
|
||||||
```bash theme={null}
|
|
||||||
Edit minimal Dify version requirement, leave it blank by default
|
|
||||||
Minimal Dify version (press Enter to next step):
|
|
||||||
```
|
|
||||||
|
|
||||||
现在你已准备就绪!CLI 将创建一个以你提供的插件名称命名的新目录,并为你的插件设置基本结构。
|
|
||||||
|
|
||||||
```bash theme={null}
|
|
||||||
cd hello-world
|
|
||||||
```
|
|
||||||
|
|
||||||
## 运行插件
|
|
||||||
|
|
||||||
确保你在 hello-world 目录中
|
|
||||||
|
|
||||||
```bash theme={null}
|
|
||||||
cp .env.example .env
|
|
||||||
```
|
|
||||||
|
|
||||||
编辑 `.env` 文件以设置插件的环境变量,例如 API 密钥或其他配置。你可以在 Dify 仪表板中找到这些变量。登录到你的 Dify 环境,点击右上角的"插件"图标,然后点击调试图标(或类似虫子的图标)。在弹出窗口中,复制"API Key"和"Host Address"。(请参考你本地对应的截图,其中显示了获取密钥和主机地址的界面)
|
|
||||||
|
|
||||||
```bash theme={null}
|
|
||||||
INSTALL_METHOD=remote
|
|
||||||
REMOTE_INSTALL_HOST=debug-plugin.dify.dev
|
|
||||||
REMOTE_INSTALL_PORT=5003
|
|
||||||
REMOTE_INSTALL_KEY=********-****-****-****-************
|
|
||||||
```
|
|
||||||
|
|
||||||
现在你可以使用以下命令在本地运行你的插件:
|
|
||||||
|
|
||||||
```bash theme={null}
|
|
||||||
pip install -r requirements.txt
|
|
||||||
python -m main
|
|
||||||
```
|
|
||||||
|
|
||||||
***
|
|
||||||
|
|
||||||
[编辑此页面](https://github.com/langgenius/dify-docs/edit/main/en/develop-plugin/getting-started/cli.mdx) | [报告问题](https://github.com/langgenius/dify-docs/issues/new?template=docs.yml)
|
|
||||||
@@ -1,184 +0,0 @@
|
|||||||
# Byte-compiled / optimized / DLL files
|
|
||||||
__pycache__/
|
|
||||||
*.py[cod]
|
|
||||||
*$py.class
|
|
||||||
|
|
||||||
# Distribution / packaging
|
|
||||||
.Python
|
|
||||||
build/
|
|
||||||
develop-eggs/
|
|
||||||
dist/
|
|
||||||
downloads/
|
|
||||||
eggs/
|
|
||||||
.eggs/
|
|
||||||
lib/
|
|
||||||
lib64/
|
|
||||||
parts/
|
|
||||||
sdist/
|
|
||||||
var/
|
|
||||||
wheels/
|
|
||||||
share/python-wheels/
|
|
||||||
*.egg-info/
|
|
||||||
.installed.cfg
|
|
||||||
*.egg
|
|
||||||
MANIFEST
|
|
||||||
|
|
||||||
# PyInstaller
|
|
||||||
# Usually these files are written by a python script from a template
|
|
||||||
# before PyInstaller builds the exe, so as to inject date/other infos into it.
|
|
||||||
*.manifest
|
|
||||||
*.spec
|
|
||||||
|
|
||||||
# Installer logs
|
|
||||||
pip-log.txt
|
|
||||||
pip-delete-this-directory.txt
|
|
||||||
|
|
||||||
# Unit test / coverage reports
|
|
||||||
htmlcov/
|
|
||||||
.tox/
|
|
||||||
.nox/
|
|
||||||
.coverage
|
|
||||||
.coverage.*
|
|
||||||
.cache
|
|
||||||
nosetests.xml
|
|
||||||
coverage.xml
|
|
||||||
*.cover
|
|
||||||
*.py,cover
|
|
||||||
.hypothesis/
|
|
||||||
.pytest_cache/
|
|
||||||
cover/
|
|
||||||
|
|
||||||
# Translations
|
|
||||||
*.mo
|
|
||||||
*.pot
|
|
||||||
|
|
||||||
# Django stuff:
|
|
||||||
*.log
|
|
||||||
local_settings.py
|
|
||||||
db.sqlite3
|
|
||||||
db.sqlite3-journal
|
|
||||||
|
|
||||||
# Flask stuff:
|
|
||||||
instance/
|
|
||||||
.webassets-cache
|
|
||||||
|
|
||||||
# Scrapy stuff:
|
|
||||||
.scrapy
|
|
||||||
|
|
||||||
# Sphinx documentation
|
|
||||||
docs/_build/
|
|
||||||
|
|
||||||
# PyBuilder
|
|
||||||
.pybuilder/
|
|
||||||
target/
|
|
||||||
|
|
||||||
# Jupyter Notebook
|
|
||||||
.ipynb_checkpoints
|
|
||||||
|
|
||||||
# IPython
|
|
||||||
profile_default/
|
|
||||||
ipython_config.py
|
|
||||||
|
|
||||||
# pyenv
|
|
||||||
# For a library or package, you might want to ignore these files since the code is
|
|
||||||
# intended to run in multiple environments; otherwise, check them in:
|
|
||||||
.python-version
|
|
||||||
|
|
||||||
# pipenv
|
|
||||||
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
|
|
||||||
# However, in case of collaboration, if having platform-specific dependencies or dependencies
|
|
||||||
# having no cross-platform support, pipenv may install dependencies that don't work, or not
|
|
||||||
# install all needed dependencies.
|
|
||||||
Pipfile.lock
|
|
||||||
|
|
||||||
# UV
|
|
||||||
# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
|
|
||||||
# This is especially recommended for binary packages to ensure reproducibility, and is more
|
|
||||||
# commonly ignored for libraries.
|
|
||||||
uv.lock
|
|
||||||
|
|
||||||
# poetry
|
|
||||||
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
|
|
||||||
# This is especially recommended for binary packages to ensure reproducibility, and is more
|
|
||||||
# commonly ignored for libraries.
|
|
||||||
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
|
|
||||||
poetry.lock
|
|
||||||
|
|
||||||
# pdm
|
|
||||||
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
|
|
||||||
#pdm.lock
|
|
||||||
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
|
|
||||||
# in version control.
|
|
||||||
# https://pdm.fming.dev/latest/usage/project/#working-with-version-control
|
|
||||||
.pdm.toml
|
|
||||||
.pdm-python
|
|
||||||
.pdm-build/
|
|
||||||
|
|
||||||
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
|
|
||||||
__pypackages__/
|
|
||||||
|
|
||||||
# Celery stuff
|
|
||||||
celerybeat-schedule
|
|
||||||
celerybeat.pid
|
|
||||||
|
|
||||||
# SageMath parsed files
|
|
||||||
*.sage.py
|
|
||||||
|
|
||||||
# Environments
|
|
||||||
.env
|
|
||||||
.venv
|
|
||||||
env/
|
|
||||||
venv/
|
|
||||||
ENV/
|
|
||||||
env.bak/
|
|
||||||
venv.bak/
|
|
||||||
|
|
||||||
# Spyder project settings
|
|
||||||
.spyderproject
|
|
||||||
.spyproject
|
|
||||||
|
|
||||||
# Rope project settings
|
|
||||||
.ropeproject
|
|
||||||
|
|
||||||
# mkdocs documentation
|
|
||||||
/site
|
|
||||||
|
|
||||||
# mypy
|
|
||||||
.mypy_cache/
|
|
||||||
.dmypy.json
|
|
||||||
dmypy.json
|
|
||||||
|
|
||||||
# Pyre type checker
|
|
||||||
.pyre/
|
|
||||||
|
|
||||||
# pytype static type analyzer
|
|
||||||
.pytype/
|
|
||||||
|
|
||||||
# Cython debug symbols
|
|
||||||
cython_debug/
|
|
||||||
|
|
||||||
# PyCharm
|
|
||||||
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
|
|
||||||
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
|
|
||||||
# and can be added to the global gitignore or merged into this file. For a more nuclear
|
|
||||||
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
|
|
||||||
.idea/
|
|
||||||
|
|
||||||
# Vscode
|
|
||||||
.vscode/
|
|
||||||
|
|
||||||
# Git
|
|
||||||
.git/
|
|
||||||
.gitignore
|
|
||||||
.github/
|
|
||||||
|
|
||||||
# Mac
|
|
||||||
.DS_Store
|
|
||||||
|
|
||||||
# Windows
|
|
||||||
Thumbs.db
|
|
||||||
|
|
||||||
# Dify plugin packages
|
|
||||||
# To prevent packaging repetitively
|
|
||||||
*.difypkg
|
|
||||||
|
|
||||||
@@ -1,3 +0,0 @@
|
|||||||
INSTALL_METHOD=remote
|
|
||||||
REMOTE_INSTALL_URL=debug.dify.ai:5003
|
|
||||||
REMOTE_INSTALL_KEY=********-****-****-****-************
|
|
||||||
109
difyPlugin/pdf/.github/workflows/plugin-publish.yml
vendored
109
difyPlugin/pdf/.github/workflows/plugin-publish.yml
vendored
@@ -1,109 +0,0 @@
|
|||||||
name: Plugin Publish Workflow
|
|
||||||
|
|
||||||
on:
|
|
||||||
release:
|
|
||||||
types: [published]
|
|
||||||
|
|
||||||
jobs:
|
|
||||||
publish:
|
|
||||||
runs-on: ubuntu-latest
|
|
||||||
steps:
|
|
||||||
- name: Checkout code
|
|
||||||
uses: actions/checkout@v3
|
|
||||||
|
|
||||||
- name: Download CLI tool
|
|
||||||
run: |
|
|
||||||
mkdir -p $RUNNER_TEMP/bin
|
|
||||||
cd $RUNNER_TEMP/bin
|
|
||||||
|
|
||||||
wget https://github.com/langgenius/dify-plugin-daemon/releases/download/0.0.6/dify-plugin-linux-amd64
|
|
||||||
chmod +x dify-plugin-linux-amd64
|
|
||||||
|
|
||||||
echo "CLI tool location:"
|
|
||||||
pwd
|
|
||||||
ls -la dify-plugin-linux-amd64
|
|
||||||
|
|
||||||
- name: Get basic info from manifest
|
|
||||||
id: get_basic_info
|
|
||||||
run: |
|
|
||||||
PLUGIN_NAME=$(grep "^name:" manifest.yaml | cut -d' ' -f2)
|
|
||||||
echo "Plugin name: $PLUGIN_NAME"
|
|
||||||
echo "plugin_name=$PLUGIN_NAME" >> $GITHUB_OUTPUT
|
|
||||||
|
|
||||||
VERSION=$(grep "^version:" manifest.yaml | cut -d' ' -f2)
|
|
||||||
echo "Plugin version: $VERSION"
|
|
||||||
echo "version=$VERSION" >> $GITHUB_OUTPUT
|
|
||||||
|
|
||||||
# If the author's name is not your github username, you can change the author here
|
|
||||||
AUTHOR=$(grep "^author:" manifest.yaml | cut -d' ' -f2)
|
|
||||||
echo "Plugin author: $AUTHOR"
|
|
||||||
echo "author=$AUTHOR" >> $GITHUB_OUTPUT
|
|
||||||
|
|
||||||
- name: Package Plugin
|
|
||||||
id: package
|
|
||||||
run: |
|
|
||||||
cd $GITHUB_WORKSPACE
|
|
||||||
PACKAGE_NAME="${{ steps.get_basic_info.outputs.plugin_name }}-${{ steps.get_basic_info.outputs.version }}.difypkg"
|
|
||||||
$RUNNER_TEMP/bin/dify-plugin-linux-amd64 plugin package . -o "$PACKAGE_NAME"
|
|
||||||
|
|
||||||
echo "Package result:"
|
|
||||||
ls -la "$PACKAGE_NAME"
|
|
||||||
echo "package_name=$PACKAGE_NAME" >> $GITHUB_OUTPUT
|
|
||||||
|
|
||||||
echo "\nFull file path:"
|
|
||||||
pwd
|
|
||||||
echo "\nDirectory structure:"
|
|
||||||
tree || ls -R
|
|
||||||
|
|
||||||
- name: Checkout target repo
|
|
||||||
uses: actions/checkout@v3
|
|
||||||
with:
|
|
||||||
repository: ${{steps.get_basic_info.outputs.author}}/dify-plugins
|
|
||||||
path: dify-plugins
|
|
||||||
token: ${{ secrets.PLUGIN_ACTION }}
|
|
||||||
fetch-depth: 1
|
|
||||||
persist-credentials: true
|
|
||||||
|
|
||||||
- name: Prepare and create PR
|
|
||||||
run: |
|
|
||||||
PACKAGE_NAME="${{ steps.get_basic_info.outputs.plugin_name }}-${{ steps.get_basic_info.outputs.version }}.difypkg"
|
|
||||||
mkdir -p dify-plugins/${{ steps.get_basic_info.outputs.author }}/${{ steps.get_basic_info.outputs.plugin_name }}
|
|
||||||
mv "$PACKAGE_NAME" dify-plugins/${{ steps.get_basic_info.outputs.author }}/${{ steps.get_basic_info.outputs.plugin_name }}/
|
|
||||||
|
|
||||||
cd dify-plugins
|
|
||||||
|
|
||||||
git config user.name "GitHub Actions"
|
|
||||||
git config user.email "actions@github.com"
|
|
||||||
|
|
||||||
git fetch origin main
|
|
||||||
git checkout main
|
|
||||||
git pull origin main
|
|
||||||
|
|
||||||
BRANCH_NAME="bump-${{ steps.get_basic_info.outputs.plugin_name }}-plugin-${{ steps.get_basic_info.outputs.version }}"
|
|
||||||
git checkout -b "$BRANCH_NAME"
|
|
||||||
|
|
||||||
git add .
|
|
||||||
git commit -m "bump ${{ steps.get_basic_info.outputs.plugin_name }} plugin to version ${{ steps.get_basic_info.outputs.version }}"
|
|
||||||
|
|
||||||
git push -u origin "$BRANCH_NAME" --force
|
|
||||||
|
|
||||||
git branch -a
|
|
||||||
echo "Waiting for branch to sync..."
|
|
||||||
sleep 10 # Wait 10 seconds for branch sync
|
|
||||||
|
|
||||||
- name: Create PR via GitHub API
|
|
||||||
env:
|
|
||||||
# How to config the token:
|
|
||||||
# 1. Profile -> Settings -> Developer settings -> Personal access tokens -> Generate new token (with repo scope) -> Copy the token
|
|
||||||
# 2. Go to the target repository -> Settings -> Secrets and variables -> Actions -> New repository secret -> Add the token as PLUGIN_ACTION
|
|
||||||
GH_TOKEN: ${{ secrets.PLUGIN_ACTION }}
|
|
||||||
run: |
|
|
||||||
gh pr create \
|
|
||||||
--repo langgenius/dify-plugins \
|
|
||||||
--head "${{ steps.get_basic_info.outputs.author }}:${{ steps.get_basic_info.outputs.plugin_name }}-${{ steps.get_basic_info.outputs.version }}" \
|
|
||||||
--base main \
|
|
||||||
--title "bump ${{ steps.get_basic_info.outputs.plugin_name }} plugin to version ${{ steps.get_basic_info.outputs.version }}" \
|
|
||||||
--body "bump ${{ steps.get_basic_info.outputs.plugin_name }} plugin package to version ${{ steps.get_basic_info.outputs.version }}
|
|
||||||
|
|
||||||
Changes:
|
|
||||||
- Updated plugin package file" || echo "PR already exists or creation skipped." # Handle cases where PR already exists
|
|
||||||
176
difyPlugin/pdf/.gitignore
vendored
176
difyPlugin/pdf/.gitignore
vendored
@@ -1,176 +0,0 @@
|
|||||||
# Byte-compiled / optimized / DLL files
|
|
||||||
__pycache__/
|
|
||||||
*.py[cod]
|
|
||||||
*$py.class
|
|
||||||
|
|
||||||
# C extensions
|
|
||||||
*.so
|
|
||||||
|
|
||||||
# Distribution / packaging
|
|
||||||
.Python
|
|
||||||
build/
|
|
||||||
develop-eggs/
|
|
||||||
dist/
|
|
||||||
downloads/
|
|
||||||
eggs/
|
|
||||||
.eggs/
|
|
||||||
lib/
|
|
||||||
lib64/
|
|
||||||
parts/
|
|
||||||
sdist/
|
|
||||||
var/
|
|
||||||
wheels/
|
|
||||||
share/python-wheels/
|
|
||||||
*.egg-info/
|
|
||||||
.installed.cfg
|
|
||||||
*.egg
|
|
||||||
MANIFEST
|
|
||||||
|
|
||||||
# PyInstaller
|
|
||||||
# Usually these files are written by a python script from a template
|
|
||||||
# before PyInstaller builds the exe, so as to inject date/other infos into it.
|
|
||||||
*.manifest
|
|
||||||
*.spec
|
|
||||||
|
|
||||||
# Installer logs
|
|
||||||
pip-log.txt
|
|
||||||
pip-delete-this-directory.txt
|
|
||||||
|
|
||||||
# Unit test / coverage reports
|
|
||||||
htmlcov/
|
|
||||||
.tox/
|
|
||||||
.nox/
|
|
||||||
.coverage
|
|
||||||
.coverage.*
|
|
||||||
.cache
|
|
||||||
nosetests.xml
|
|
||||||
coverage.xml
|
|
||||||
*.cover
|
|
||||||
*.py,cover
|
|
||||||
.hypothesis/
|
|
||||||
.pytest_cache/
|
|
||||||
cover/
|
|
||||||
|
|
||||||
# Translations
|
|
||||||
*.mo
|
|
||||||
*.pot
|
|
||||||
|
|
||||||
# Django stuff:
|
|
||||||
*.log
|
|
||||||
local_settings.py
|
|
||||||
db.sqlite3
|
|
||||||
db.sqlite3-journal
|
|
||||||
|
|
||||||
# Flask stuff:
|
|
||||||
instance/
|
|
||||||
.webassets-cache
|
|
||||||
|
|
||||||
# Scrapy stuff:
|
|
||||||
.scrapy
|
|
||||||
|
|
||||||
# Sphinx documentation
|
|
||||||
docs/_build/
|
|
||||||
|
|
||||||
# PyBuilder
|
|
||||||
.pybuilder/
|
|
||||||
target/
|
|
||||||
|
|
||||||
# Jupyter Notebook
|
|
||||||
.ipynb_checkpoints
|
|
||||||
|
|
||||||
# IPython
|
|
||||||
profile_default/
|
|
||||||
ipython_config.py
|
|
||||||
|
|
||||||
# pyenv
|
|
||||||
# For a library or package, you might want to ignore these files since the code is
|
|
||||||
# intended to run in multiple environments; otherwise, check them in:
|
|
||||||
# .python-version
|
|
||||||
|
|
||||||
# pipenv
|
|
||||||
# According to pypa/pipenv#598, it is recommended to include Pipfile.lock in version control.
|
|
||||||
# However, in case of collaboration, if having platform-specific dependencies or dependencies
|
|
||||||
# having no cross-platform support, pipenv may install dependencies that don't work, or not
|
|
||||||
# install all needed dependencies.
|
|
||||||
#Pipfile.lock
|
|
||||||
|
|
||||||
# UV
|
|
||||||
# Similar to Pipfile.lock, it is generally recommended to include uv.lock in version control.
|
|
||||||
# This is especially recommended for binary packages to ensure reproducibility, and is more
|
|
||||||
# commonly ignored for libraries.
|
|
||||||
#uv.lock
|
|
||||||
|
|
||||||
# poetry
|
|
||||||
# Similar to Pipfile.lock, it is generally recommended to include poetry.lock in version control.
|
|
||||||
# This is especially recommended for binary packages to ensure reproducibility, and is more
|
|
||||||
# commonly ignored for libraries.
|
|
||||||
# https://python-poetry.org/docs/basic-usage/#commit-your-poetrylock-file-to-version-control
|
|
||||||
#poetry.lock
|
|
||||||
|
|
||||||
# pdm
|
|
||||||
# Similar to Pipfile.lock, it is generally recommended to include pdm.lock in version control.
|
|
||||||
#pdm.lock
|
|
||||||
# pdm stores project-wide configurations in .pdm.toml, but it is recommended to not include it
|
|
||||||
# in version control.
|
|
||||||
# https://pdm.fming.dev/latest/usage/project/#working-with-version-control
|
|
||||||
.pdm.toml
|
|
||||||
.pdm-python
|
|
||||||
.pdm-build/
|
|
||||||
|
|
||||||
# PEP 582; used by e.g. github.com/David-OConnor/pyflow and github.com/pdm-project/pdm
|
|
||||||
__pypackages__/
|
|
||||||
|
|
||||||
# Celery stuff
|
|
||||||
celerybeat-schedule
|
|
||||||
celerybeat.pid
|
|
||||||
|
|
||||||
# SageMath parsed files
|
|
||||||
*.sage.py
|
|
||||||
|
|
||||||
# Environments
|
|
||||||
.env
|
|
||||||
.venv
|
|
||||||
env/
|
|
||||||
venv/
|
|
||||||
ENV/
|
|
||||||
env.bak/
|
|
||||||
venv.bak/
|
|
||||||
|
|
||||||
# Spyder project settings
|
|
||||||
.spyderproject
|
|
||||||
.spyproject
|
|
||||||
|
|
||||||
# Rope project settings
|
|
||||||
.ropeproject
|
|
||||||
|
|
||||||
# mkdocs documentation
|
|
||||||
/site
|
|
||||||
|
|
||||||
# mypy
|
|
||||||
.mypy_cache/
|
|
||||||
.dmypy.json
|
|
||||||
dmypy.json
|
|
||||||
|
|
||||||
# Pyre type checker
|
|
||||||
.pyre/
|
|
||||||
|
|
||||||
# pytype static type analyzer
|
|
||||||
.pytype/
|
|
||||||
|
|
||||||
# Cython debug symbols
|
|
||||||
cython_debug/
|
|
||||||
|
|
||||||
# PyCharm
|
|
||||||
# JetBrains specific template is maintained in a separate JetBrains.gitignore that can
|
|
||||||
# be found at https://github.com/github/gitignore/blob/main/Global/JetBrains.gitignore
|
|
||||||
# and can be added to the global gitignore or merged into this file. For a more nuclear
|
|
||||||
# option (not recommended) you can uncomment the following to ignore the entire idea folder.
|
|
||||||
.idea/
|
|
||||||
|
|
||||||
# Vscode
|
|
||||||
.vscode/
|
|
||||||
|
|
||||||
# macOS
|
|
||||||
.DS_Store
|
|
||||||
.AppleDouble
|
|
||||||
.LSOverride
|
|
||||||
@@ -1,137 +0,0 @@
|
|||||||
# Dify Plugin Development Guide
|
|
||||||
|
|
||||||
Welcome to Dify plugin development! This guide will help you get started quickly.
|
|
||||||
|
|
||||||
## Plugin Types
|
|
||||||
|
|
||||||
Dify plugins extend three main capabilities:
|
|
||||||
|
|
||||||
| Type | Description | Example |
|
|
||||||
|------|-------------|---------|
|
|
||||||
| **Tool** | Perform specific tasks | Google Search, Stable Diffusion |
|
|
||||||
| **Model** | AI model integrations | OpenAI, Anthropic |
|
|
||||||
| **Endpoint** | HTTP services | Custom APIs, integrations |
|
|
||||||
|
|
||||||
You can create:
|
|
||||||
- **Tool**: Tool provider with optional endpoints (e.g., Discord bot)
|
|
||||||
- **Model**: Model provider only
|
|
||||||
- **Extension**: Simple HTTP service
|
|
||||||
|
|
||||||
## Setup
|
|
||||||
|
|
||||||
### Requirements
|
|
||||||
- Python 3.11+
|
|
||||||
- Dependencies: `pip install -r requirements.txt`
|
|
||||||
|
|
||||||
## Development Process
|
|
||||||
|
|
||||||
<details>
|
|
||||||
<summary><b>1. Manifest Structure</b></summary>
|
|
||||||
|
|
||||||
Edit `manifest.yaml` to describe your plugin:
|
|
||||||
|
|
||||||
```yaml
|
|
||||||
version: 0.1.0 # Required: Plugin version
|
|
||||||
type: plugin # Required: plugin or bundle
|
|
||||||
author: YourOrganization # Required: Organization name
|
|
||||||
label: # Required: Multi-language names
|
|
||||||
en_US: Plugin Name
|
|
||||||
zh_Hans: 插件名称
|
|
||||||
created_at: 2023-01-01T00:00:00Z # Required: Creation time (RFC3339)
|
|
||||||
icon: assets/icon.png # Required: Icon path
|
|
||||||
|
|
||||||
# Resources and permissions
|
|
||||||
resource:
|
|
||||||
memory: 268435456 # Max memory (bytes)
|
|
||||||
permission:
|
|
||||||
tool:
|
|
||||||
enabled: true # Tool permission
|
|
||||||
model:
|
|
||||||
enabled: true # Model permission
|
|
||||||
llm: true
|
|
||||||
text_embedding: false
|
|
||||||
# Other model types...
|
|
||||||
# Other permissions...
|
|
||||||
|
|
||||||
# Extensions definition
|
|
||||||
plugins:
|
|
||||||
tools:
|
|
||||||
- tools/my_tool.yaml # Tool definition files
|
|
||||||
models:
|
|
||||||
- models/my_model.yaml # Model definition files
|
|
||||||
endpoints:
|
|
||||||
- endpoints/my_api.yaml # Endpoint definition files
|
|
||||||
|
|
||||||
# Runtime metadata
|
|
||||||
meta:
|
|
||||||
version: 0.0.1 # Manifest format version
|
|
||||||
arch:
|
|
||||||
- amd64
|
|
||||||
- arm64
|
|
||||||
runner:
|
|
||||||
language: python
|
|
||||||
version: "3.12"
|
|
||||||
entrypoint: main
|
|
||||||
```
|
|
||||||
|
|
||||||
**Restrictions:**
|
|
||||||
- Cannot extend both tools and models
|
|
||||||
- Must have at least one extension
|
|
||||||
- Cannot extend both models and endpoints
|
|
||||||
- Limited to one supplier per extension type
|
|
||||||
</details>
|
|
||||||
|
|
||||||
<details>
|
|
||||||
<summary><b>2. Implementation Examples</b></summary>
|
|
||||||
|
|
||||||
Study these examples to understand plugin implementation:
|
|
||||||
|
|
||||||
- [OpenAI](https://github.com/langgenius/dify-plugin-sdks/tree/main/python/examples/openai) - Model provider
|
|
||||||
- [Google Search](https://github.com/langgenius/dify-plugin-sdks/tree/main/python/examples/google) - Tool provider
|
|
||||||
- [Neko](https://github.com/langgenius/dify-plugin-sdks/tree/main/python/examples/neko) - Endpoint group
|
|
||||||
</details>
|
|
||||||
|
|
||||||
<details>
|
|
||||||
<summary><b>3. Testing & Debugging</b></summary>
|
|
||||||
|
|
||||||
1. Copy `.env.example` to `.env` and configure:
|
|
||||||
```
|
|
||||||
INSTALL_METHOD=remote
|
|
||||||
REMOTE_INSTALL_URL=debug.dify.ai:5003
|
|
||||||
REMOTE_INSTALL_KEY=your-debug-key
|
|
||||||
```
|
|
||||||
|
|
||||||
2. Run your plugin:
|
|
||||||
```bash
|
|
||||||
python -m main
|
|
||||||
```
|
|
||||||
|
|
||||||
3. Refresh your Dify instance to see the plugin (marked as "debugging")
|
|
||||||
</details>
|
|
||||||
|
|
||||||
<details>
|
|
||||||
<summary><b>4. Publishing</b></summary>
|
|
||||||
|
|
||||||
#### Manual Packaging
|
|
||||||
```bash
|
|
||||||
dify-plugin plugin package ./YOUR_PLUGIN_DIR
|
|
||||||
```
|
|
||||||
|
|
||||||
#### Automated GitHub Workflow
|
|
||||||
|
|
||||||
Configure GitHub Actions to automate PR creation:
|
|
||||||
|
|
||||||
1. Create a Personal Access Token for your forked repository
|
|
||||||
2. Add it as `PLUGIN_ACTION` secret in your source repo
|
|
||||||
3. Create `.github/workflows/plugin-publish.yml`
|
|
||||||
|
|
||||||
When you create a release, the action will:
|
|
||||||
- Package your plugin
|
|
||||||
- Create a PR to your fork
|
|
||||||
|
|
||||||
[Detailed workflow documentation](https://docs.dify.ai/plugins/publish-plugins/plugin-auto-publish-pr)
|
|
||||||
</details>
|
|
||||||
|
|
||||||
## Privacy Policy
|
|
||||||
|
|
||||||
If publishing to the Marketplace, provide a privacy policy in [PRIVACY.md](PRIVACY.md).
|
|
||||||
@@ -1,3 +0,0 @@
|
|||||||
## Privacy
|
|
||||||
|
|
||||||
!!! Please fill in the privacy policy of the plugin.
|
|
||||||
@@ -1,10 +0,0 @@
|
|||||||
## pdf
|
|
||||||
|
|
||||||
**Author:** yslg
|
|
||||||
**Version:** 0.0.1
|
|
||||||
**Type:** tool
|
|
||||||
|
|
||||||
### Description
|
|
||||||
|
|
||||||
|
|
||||||
|
|
||||||
@@ -1,55 +0,0 @@
|
|||||||
<!--
|
|
||||||
~ Dify Marketplace Template Icon
|
|
||||||
~ Dify 市场模板图标
|
|
||||||
~ Dify マーケットプレイステンプレートアイコン
|
|
||||||
~
|
|
||||||
~ WARNING / 警告 / 警告:
|
|
||||||
~
|
|
||||||
~ English: This is a TEMPLATE icon from Dify Marketplace only. You MUST NOT use this default icon in any way.
|
|
||||||
~ Please replace it with your own custom icon before submit this plugin.
|
|
||||||
~
|
|
||||||
~ 中文: 这只是来自 Dify 市场的模板图标。您绝对不能以任何方式使用此默认图标。
|
|
||||||
~ 请在提交此插件之前将其替换为您自己的自定义图标。
|
|
||||||
~
|
|
||||||
~ 日本語: これは Dify マーケットプレイスのテンプレートアイコンです。このデフォルトアイコンをいかなる方法でも使用してはいけません。
|
|
||||||
~ このプラグインを提出する前に、独自のカスタムアイコンに置き換えてください。
|
|
||||||
~
|
|
||||||
~ DIFY_MARKETPLACE_TEMPLATE_ICON_DO_NOT_USE
|
|
||||||
-->
|
|
||||||
<svg width="40" height="40" viewBox="0 0 40 40" fill="none" xmlns="http://www.w3.org/2000/svg">
|
|
||||||
<g clip-path="url(#clip0_15253_95095)">
|
|
||||||
<rect width="40" height="40" fill="#0033FF"/>
|
|
||||||
<g filter="url(#filter0_n_15253_95095)">
|
|
||||||
<rect width="40" height="40" fill="url(#paint0_linear_15253_95095)"/>
|
|
||||||
</g>
|
|
||||||
<path d="M28 10C28.5523 10 29 10.4477 29 11V16C29 16.5523 28.5523 17 28 17H23V30C23 30.5523 22.5523 31 22 31H18C17.4477 31 17 30.5523 17 30V17H11.5C10.9477 17 10.5 16.5523 10.5 16V13.618C10.5 13.2393 10.714 12.893 11.0528 12.7236L16.5 10H28ZM23 12H16.9721L12.5 14.2361V15H19V29H21V15H23V12ZM27 12H25V15H27V12Z" fill="white"/>
|
|
||||||
</g>
|
|
||||||
<defs>
|
|
||||||
<filter id="filter0_n_15253_95095" x="0" y="0" width="40" height="40" filterUnits="userSpaceOnUse" color-interpolation-filters="sRGB">
|
|
||||||
<feFlood flood-opacity="0" result="BackgroundImageFix"/>
|
|
||||||
<feBlend mode="normal" in="SourceGraphic" in2="BackgroundImageFix" result="shape"/>
|
|
||||||
<feTurbulence type="fractalNoise" baseFrequency="2 2" stitchTiles="stitch" numOctaves="3" result="noise" seed="8033" />
|
|
||||||
<feComponentTransfer in="noise" result="coloredNoise1">
|
|
||||||
<feFuncR type="linear" slope="2" intercept="-0.5" />
|
|
||||||
<feFuncG type="linear" slope="2" intercept="-0.5" />
|
|
||||||
<feFuncB type="linear" slope="2" intercept="-0.5" />
|
|
||||||
<feFuncA type="discrete" tableValues="1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "/>
|
|
||||||
</feComponentTransfer>
|
|
||||||
<feComposite operator="in" in2="shape" in="coloredNoise1" result="noise1Clipped" />
|
|
||||||
<feComponentTransfer in="noise1Clipped" result="color1">
|
|
||||||
<feFuncA type="table" tableValues="0 0.06" />
|
|
||||||
</feComponentTransfer>
|
|
||||||
<feMerge result="effect1_noise_15253_95095">
|
|
||||||
<feMergeNode in="shape" />
|
|
||||||
<feMergeNode in="color1" />
|
|
||||||
</feMerge>
|
|
||||||
</filter>
|
|
||||||
<linearGradient id="paint0_linear_15253_95095" x1="0" y1="0" x2="40" y2="40" gradientUnits="userSpaceOnUse">
|
|
||||||
<stop stop-color="#1443FF"/>
|
|
||||||
<stop offset="1" stop-color="#0031F5"/>
|
|
||||||
</linearGradient>
|
|
||||||
<clipPath id="clip0_15253_95095">
|
|
||||||
<rect width="40" height="40" fill="white"/>
|
|
||||||
</clipPath>
|
|
||||||
</defs>
|
|
||||||
</svg>
|
|
||||||
|
Before Width: | Height: | Size: 3.0 KiB |
@@ -1,55 +0,0 @@
|
|||||||
<!--
|
|
||||||
~ Dify Marketplace Template Icon
|
|
||||||
~ Dify 市场模板图标
|
|
||||||
~ Dify マーケットプレイステンプレートアイコン
|
|
||||||
~
|
|
||||||
~ WARNING / 警告 / 警告:
|
|
||||||
~
|
|
||||||
~ English: This is a TEMPLATE icon from Dify Marketplace only. You MUST NOT use this default icon in any way.
|
|
||||||
~ Please replace it with your own custom icon before submit this plugin.
|
|
||||||
~
|
|
||||||
~ 中文: 这只是来自 Dify 市场的模板图标。您绝对不能以任何方式使用此默认图标。
|
|
||||||
~ 请在提交此插件之前将其替换为您自己的自定义图标。
|
|
||||||
~
|
|
||||||
~ 日本語: これは Dify マーケットプレイスのテンプレートアイコンです。このデフォルトアイコンをいかなる方法でも使用してはいけません。
|
|
||||||
~ このプラグインを提出する前に、独自のカスタムアイコンに置き換えてください。
|
|
||||||
~
|
|
||||||
~ DIFY_MARKETPLACE_TEMPLATE_ICON_DO_NOT_USE
|
|
||||||
-->
|
|
||||||
<svg width="40" height="40" viewBox="0 0 40 40" fill="none" xmlns="http://www.w3.org/2000/svg">
|
|
||||||
<g clip-path="url(#clip0_15255_46435)">
|
|
||||||
<rect width="40" height="40" fill="#0033FF"/>
|
|
||||||
<g filter="url(#filter0_n_15255_46435)">
|
|
||||||
<rect width="40" height="40" fill="url(#paint0_linear_15255_46435)"/>
|
|
||||||
</g>
|
|
||||||
<path d="M28 10C28.5523 10 29 10.4477 29 11V16C29 16.5523 28.5523 17 28 17H23V30C23 30.5523 22.5523 31 22 31H18C17.4477 31 17 30.5523 17 30V17H11.5C10.9477 17 10.5 16.5523 10.5 16V13.618C10.5 13.2393 10.714 12.893 11.0528 12.7236L16.5 10H28ZM23 12H16.9721L12.5 14.2361V15H19V29H21V15H23V12ZM27 12H25V15H27V12Z" fill="white"/>
|
|
||||||
</g>
|
|
||||||
<defs>
|
|
||||||
<filter id="filter0_n_15255_46435" x="0" y="0" width="40" height="40" filterUnits="userSpaceOnUse" color-interpolation-filters="sRGB">
|
|
||||||
<feFlood flood-opacity="0" result="BackgroundImageFix"/>
|
|
||||||
<feBlend mode="normal" in="SourceGraphic" in2="BackgroundImageFix" result="shape"/>
|
|
||||||
<feTurbulence type="fractalNoise" baseFrequency="2 2" stitchTiles="stitch" numOctaves="3" result="noise" seed="8033" />
|
|
||||||
<feComponentTransfer in="noise" result="coloredNoise1">
|
|
||||||
<feFuncR type="linear" slope="2" intercept="-0.5" />
|
|
||||||
<feFuncG type="linear" slope="2" intercept="-0.5" />
|
|
||||||
<feFuncB type="linear" slope="2" intercept="-0.5" />
|
|
||||||
<feFuncA type="discrete" tableValues="1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 1 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 0 "/>
|
|
||||||
</feComponentTransfer>
|
|
||||||
<feComposite operator="in" in2="shape" in="coloredNoise1" result="noise1Clipped" />
|
|
||||||
<feComponentTransfer in="noise1Clipped" result="color1">
|
|
||||||
<feFuncA type="table" tableValues="0 0.06" />
|
|
||||||
</feComponentTransfer>
|
|
||||||
<feMerge result="effect1_noise_15255_46435">
|
|
||||||
<feMergeNode in="shape" />
|
|
||||||
<feMergeNode in="color1" />
|
|
||||||
</feMerge>
|
|
||||||
</filter>
|
|
||||||
<linearGradient id="paint0_linear_15255_46435" x1="0" y1="0" x2="40" y2="40" gradientUnits="userSpaceOnUse">
|
|
||||||
<stop stop-color="#1F4CFF"/>
|
|
||||||
<stop offset="1" stop-color="#0033FF"/>
|
|
||||||
</linearGradient>
|
|
||||||
<clipPath id="clip0_15255_46435">
|
|
||||||
<rect width="40" height="40" fill="white"/>
|
|
||||||
</clipPath>
|
|
||||||
</defs>
|
|
||||||
</svg>
|
|
||||||
|
Before Width: | Height: | Size: 3.0 KiB |
@@ -1,6 +0,0 @@
|
|||||||
from dify_plugin import Plugin, DifyPluginEnv
|
|
||||||
|
|
||||||
plugin = Plugin(DifyPluginEnv(MAX_REQUEST_TIMEOUT=120))
|
|
||||||
|
|
||||||
if __name__ == '__main__':
|
|
||||||
plugin.run()
|
|
||||||
@@ -1,40 +0,0 @@
|
|||||||
version: 0.0.1
|
|
||||||
type: plugin
|
|
||||||
author: yslg
|
|
||||||
name: pdf
|
|
||||||
label:
|
|
||||||
en_US: pdf
|
|
||||||
ja_JP: pdf
|
|
||||||
zh_Hans: pdf
|
|
||||||
pt_BR: pdf
|
|
||||||
description:
|
|
||||||
en_US: pdfTools
|
|
||||||
ja_JP: pdfTools
|
|
||||||
zh_Hans: pdfTools
|
|
||||||
pt_BR: pdfTools
|
|
||||||
icon: icon.svg
|
|
||||||
icon_dark: icon-dark.svg
|
|
||||||
resource:
|
|
||||||
memory: 268435456
|
|
||||||
permission:
|
|
||||||
tool:
|
|
||||||
enabled: true
|
|
||||||
model:
|
|
||||||
enabled: true
|
|
||||||
llm: true
|
|
||||||
plugins:
|
|
||||||
tools:
|
|
||||||
- provider/pdf.yaml
|
|
||||||
meta:
|
|
||||||
version: 0.0.1
|
|
||||||
arch:
|
|
||||||
- amd64
|
|
||||||
- arm64
|
|
||||||
runner:
|
|
||||||
language: python
|
|
||||||
version: "3.12"
|
|
||||||
entrypoint: main
|
|
||||||
minimum_dify_version: null
|
|
||||||
created_at: 2026-03-02T13:21:03.2806864+08:00
|
|
||||||
privacy: PRIVACY.md
|
|
||||||
verified: false
|
|
||||||
@@ -1,64 +0,0 @@
|
|||||||
{
|
|
||||||
"name": "pdf-plugin",
|
|
||||||
"version": "1.0.0",
|
|
||||||
"description": "PDF plugin for analyzing table of contents and extracting text",
|
|
||||||
"author": "System",
|
|
||||||
"type": "tool",
|
|
||||||
"main": "main.py",
|
|
||||||
"requirements": "requirements.txt",
|
|
||||||
"icon": "https://neeko-copilot.bytedance.net/api/text2image?prompt=PDF%20document%20icon&size=square",
|
|
||||||
"settings": [
|
|
||||||
{
|
|
||||||
"key": "debug",
|
|
||||||
"type": "boolean",
|
|
||||||
"default": false,
|
|
||||||
"description": "Enable debug mode"
|
|
||||||
}
|
|
||||||
],
|
|
||||||
"functions": [
|
|
||||||
{
|
|
||||||
"name": "analyze_toc",
|
|
||||||
"description": "Analyze PDF and find table of contents",
|
|
||||||
"parameters": {
|
|
||||||
"type": "object",
|
|
||||||
"properties": {
|
|
||||||
"file": {
|
|
||||||
"type": "file",
|
|
||||||
"description": "PDF file to analyze",
|
|
||||||
"fileTypes": ["pdf"]
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"required": ["file"]
|
|
||||||
}
|
|
||||||
},
|
|
||||||
{
|
|
||||||
"name": "extract_text",
|
|
||||||
"description": "Extract text from specified page range",
|
|
||||||
"parameters": {
|
|
||||||
"type": "object",
|
|
||||||
"properties": {
|
|
||||||
"file": {
|
|
||||||
"type": "file",
|
|
||||||
"description": "PDF file to extract text from",
|
|
||||||
"fileTypes": ["pdf"]
|
|
||||||
},
|
|
||||||
"page_range": {
|
|
||||||
"type": "object",
|
|
||||||
"properties": {
|
|
||||||
"start": {
|
|
||||||
"type": "integer",
|
|
||||||
"default": 0,
|
|
||||||
"description": "Start page index"
|
|
||||||
},
|
|
||||||
"end": {
|
|
||||||
"type": "integer",
|
|
||||||
"description": "End page index"
|
|
||||||
}
|
|
||||||
}
|
|
||||||
}
|
|
||||||
},
|
|
||||||
"required": ["file"]
|
|
||||||
}
|
|
||||||
}
|
|
||||||
]
|
|
||||||
}
|
|
||||||
@@ -1,53 +0,0 @@
|
|||||||
from typing import Any
|
|
||||||
|
|
||||||
from dify_plugin import ToolProvider
|
|
||||||
from dify_plugin.errors.tool import ToolProviderCredentialValidationError
|
|
||||||
|
|
||||||
|
|
||||||
class PdfProvider(ToolProvider):
|
|
||||||
|
|
||||||
def _validate_credentials(self, credentials: dict[str, Any]) -> None:
|
|
||||||
try:
|
|
||||||
"""
|
|
||||||
IMPLEMENT YOUR VALIDATION HERE
|
|
||||||
"""
|
|
||||||
except Exception as e:
|
|
||||||
raise ToolProviderCredentialValidationError(str(e))
|
|
||||||
|
|
||||||
#########################################################################################
|
|
||||||
# If OAuth is supported, uncomment the following functions.
|
|
||||||
# Warning: please make sure that the sdk version is 0.4.2 or higher.
|
|
||||||
#########################################################################################
|
|
||||||
# def _oauth_get_authorization_url(self, redirect_uri: str, system_credentials: Mapping[str, Any]) -> str:
|
|
||||||
# """
|
|
||||||
# Generate the authorization URL for pdf OAuth.
|
|
||||||
# """
|
|
||||||
# try:
|
|
||||||
# """
|
|
||||||
# IMPLEMENT YOUR AUTHORIZATION URL GENERATION HERE
|
|
||||||
# """
|
|
||||||
# except Exception as e:
|
|
||||||
# raise ToolProviderOAuthError(str(e))
|
|
||||||
# return ""
|
|
||||||
|
|
||||||
# def _oauth_get_credentials(
|
|
||||||
# self, redirect_uri: str, system_credentials: Mapping[str, Any], request: Request
|
|
||||||
# ) -> Mapping[str, Any]:
|
|
||||||
# """
|
|
||||||
# Exchange code for access_token.
|
|
||||||
# """
|
|
||||||
# try:
|
|
||||||
# """
|
|
||||||
# IMPLEMENT YOUR CREDENTIALS EXCHANGE HERE
|
|
||||||
# """
|
|
||||||
# except Exception as e:
|
|
||||||
# raise ToolProviderOAuthError(str(e))
|
|
||||||
# return dict()
|
|
||||||
|
|
||||||
# def _oauth_refresh_credentials(
|
|
||||||
# self, redirect_uri: str, system_credentials: Mapping[str, Any], credentials: Mapping[str, Any]
|
|
||||||
# ) -> OAuthCredentials:
|
|
||||||
# """
|
|
||||||
# Refresh the credentials
|
|
||||||
# """
|
|
||||||
# return OAuthCredentials(credentials=credentials, expires_at=-1)
|
|
||||||
@@ -1,21 +0,0 @@
|
|||||||
identity:
|
|
||||||
author: "yslg"
|
|
||||||
name: "pdf"
|
|
||||||
label:
|
|
||||||
en_US: "pdf"
|
|
||||||
zh_Hans: "pdf"
|
|
||||||
pt_BR: "pdf"
|
|
||||||
ja_JP: "pdf"
|
|
||||||
description:
|
|
||||||
en_US: "pdfTools"
|
|
||||||
zh_Hans: "pdfTools"
|
|
||||||
pt_BR: "pdfTools"
|
|
||||||
ja_JP: "pdfTools"
|
|
||||||
icon: "icon.svg"
|
|
||||||
|
|
||||||
tools:
|
|
||||||
- tools/pdf_toc.yaml
|
|
||||||
- tools/pdf_to_markdown.yaml
|
|
||||||
extra:
|
|
||||||
python:
|
|
||||||
source: provider/pdf.py
|
|
||||||
@@ -1,2 +0,0 @@
|
|||||||
dify_plugin>=0.4.0,<0.7.0
|
|
||||||
pymupdf>=1.27.1
|
|
||||||
@@ -1,234 +0,0 @@
|
|||||||
import json
|
|
||||||
import re
|
|
||||||
from collections.abc import Generator
|
|
||||||
from typing import Any
|
|
||||||
|
|
||||||
import fitz
|
|
||||||
from dify_plugin import Tool
|
|
||||||
from dify_plugin.entities.tool import ToolInvokeMessage
|
|
||||||
|
|
||||||
|
|
||||||
class PdfToMarkdownTool(Tool):
|
|
||||||
"""Convert PDF to Markdown using an external catalog array."""
|
|
||||||
|
|
||||||
def _invoke(self, tool_parameters: dict[str, Any]) -> Generator[ToolInvokeMessage]:
|
|
||||||
file = tool_parameters.get("file")
|
|
||||||
catalog_text = (tool_parameters.get("catalog") or "").strip()
|
|
||||||
if not file:
|
|
||||||
yield self.create_text_message("Error: file is required")
|
|
||||||
return
|
|
||||||
if not catalog_text:
|
|
||||||
yield self.create_text_message("Error: catalog is required")
|
|
||||||
return
|
|
||||||
|
|
||||||
catalog = self._parse_catalog(catalog_text)
|
|
||||||
if not catalog:
|
|
||||||
yield self.create_text_message("Error: catalog must be a JSON array with title and page indexes")
|
|
||||||
return
|
|
||||||
|
|
||||||
doc = fitz.open(stream=file.blob, filetype="pdf")
|
|
||||||
try:
|
|
||||||
num_pages = len(doc)
|
|
||||||
hf_texts = self._detect_headers_footers(doc, num_pages)
|
|
||||||
page_mds = [self._page_to_markdown(doc[index], hf_texts) for index in range(num_pages)]
|
|
||||||
final_md = self._assemble_by_catalog(catalog, page_mds, num_pages)
|
|
||||||
|
|
||||||
yield self.create_text_message(final_md)
|
|
||||||
yield self.create_blob_message(
|
|
||||||
blob=final_md.encode("utf-8"),
|
|
||||||
meta={"mime_type": "text/markdown"},
|
|
||||||
)
|
|
||||||
finally:
|
|
||||||
doc.close()
|
|
||||||
|
|
||||||
def _parse_catalog(self, catalog_text: str) -> list[dict[str, Any]]:
|
|
||||||
try:
|
|
||||||
raw = json.loads(catalog_text)
|
|
||||||
except Exception:
|
|
||||||
return []
|
|
||||||
|
|
||||||
if not isinstance(raw, list):
|
|
||||||
return []
|
|
||||||
|
|
||||||
result: list[dict[str, Any]] = []
|
|
||||||
for item in raw:
|
|
||||||
if not isinstance(item, dict):
|
|
||||||
continue
|
|
||||||
|
|
||||||
title = str(item.get("title") or "").strip() or "Untitled"
|
|
||||||
start_index = self._to_int(item.get("page_start_index"), None)
|
|
||||||
end_index = self._to_int(item.get("page_end_index"), start_index)
|
|
||||||
|
|
||||||
if start_index is None:
|
|
||||||
start = self._to_int(item.get("start"), None)
|
|
||||||
end = self._to_int(item.get("end"), start)
|
|
||||||
if start is None:
|
|
||||||
continue
|
|
||||||
start_index = max(0, start - 1)
|
|
||||||
end_index = max(start_index, (end if end is not None else start) - 1)
|
|
||||||
|
|
||||||
if end_index is None:
|
|
||||||
end_index = start_index
|
|
||||||
|
|
||||||
result.append(
|
|
||||||
{
|
|
||||||
"title": title,
|
|
||||||
"page_start_index": max(0, start_index),
|
|
||||||
"page_end_index": max(start_index, end_index),
|
|
||||||
}
|
|
||||||
)
|
|
||||||
return result
|
|
||||||
|
|
||||||
def _detect_headers_footers(self, doc: fitz.Document, num_pages: int) -> set[str]:
|
|
||||||
margin_ratio = 0.08
|
|
||||||
sample_count = min(num_pages, 30)
|
|
||||||
text_counts: dict[str, int] = {}
|
|
||||||
|
|
||||||
for idx in range(sample_count):
|
|
||||||
page = doc[idx]
|
|
||||||
page_height = page.rect.height
|
|
||||||
top_limit = page_height * margin_ratio
|
|
||||||
bottom_limit = page_height * (1 - margin_ratio)
|
|
||||||
try:
|
|
||||||
blocks = page.get_text("blocks", sort=True) or []
|
|
||||||
except Exception:
|
|
||||||
continue
|
|
||||||
|
|
||||||
seen: set[str] = set()
|
|
||||||
for block in blocks:
|
|
||||||
if len(block) < 7 or block[6] != 0:
|
|
||||||
continue
|
|
||||||
y0, y1 = block[1], block[3]
|
|
||||||
text = (block[4] or "").strip()
|
|
||||||
if not text or len(text) < 2 or text in seen:
|
|
||||||
continue
|
|
||||||
if y1 <= top_limit or y0 >= bottom_limit:
|
|
||||||
seen.add(text)
|
|
||||||
text_counts[text] = text_counts.get(text, 0) + 1
|
|
||||||
|
|
||||||
threshold = max(3, sample_count * 0.35)
|
|
||||||
return {text for text, count in text_counts.items() if count >= threshold}
|
|
||||||
|
|
||||||
def _page_to_markdown(self, page: fitz.Page, hf_texts: set[str]) -> str:
|
|
||||||
parts: list[str] = []
|
|
||||||
page_height = page.rect.height
|
|
||||||
top_margin = page_height * 0.06
|
|
||||||
bottom_margin = page_height * 0.94
|
|
||||||
|
|
||||||
table_rects: list[fitz.Rect] = []
|
|
||||||
table_mds: list[str] = []
|
|
||||||
try:
|
|
||||||
find_tables = getattr(page, "find_tables", None)
|
|
||||||
tables = []
|
|
||||||
if callable(find_tables):
|
|
||||||
table_finder = find_tables()
|
|
||||||
tables = getattr(table_finder, "tables", []) or []
|
|
||||||
|
|
||||||
for table in tables[:5]:
|
|
||||||
try:
|
|
||||||
table_rects.append(fitz.Rect(table.bbox))
|
|
||||||
except Exception:
|
|
||||||
pass
|
|
||||||
|
|
||||||
cells = table.extract() or []
|
|
||||||
if len(cells) < 2:
|
|
||||||
continue
|
|
||||||
if hf_texts and len(cells) <= 3:
|
|
||||||
flat = " ".join(str(cell or "") for row in cells for cell in row)
|
|
||||||
if any(hf in flat for hf in hf_texts):
|
|
||||||
continue
|
|
||||||
|
|
||||||
md_table = self._cells_to_md_table(cells)
|
|
||||||
if md_table:
|
|
||||||
table_mds.append(md_table)
|
|
||||||
except Exception:
|
|
||||||
pass
|
|
||||||
|
|
||||||
try:
|
|
||||||
blocks = page.get_text("blocks", sort=True) or []
|
|
||||||
except Exception:
|
|
||||||
blocks = []
|
|
||||||
|
|
||||||
for block in blocks:
|
|
||||||
if len(block) < 7 or block[6] != 0:
|
|
||||||
continue
|
|
||||||
x0, y0, x1, y1 = block[:4]
|
|
||||||
text = (block[4] or "").strip()
|
|
||||||
if not text:
|
|
||||||
continue
|
|
||||||
|
|
||||||
block_rect = fitz.Rect(x0, y0, x1, y1)
|
|
||||||
if any(self._rects_overlap(block_rect, table_rect) for table_rect in table_rects):
|
|
||||||
continue
|
|
||||||
if hf_texts and (y1 <= top_margin or y0 >= bottom_margin):
|
|
||||||
if any(hf in text for hf in hf_texts):
|
|
||||||
continue
|
|
||||||
if re.fullmatch(r"\s*\d{1,4}\s*", text):
|
|
||||||
continue
|
|
||||||
|
|
||||||
parts.append(text)
|
|
||||||
|
|
||||||
parts.extend(table_mds)
|
|
||||||
return "\n\n".join(parts)
|
|
||||||
|
|
||||||
def _assemble_by_catalog(self, catalog: list[dict[str, Any]], page_mds: list[str], num_pages: int) -> str:
|
|
||||||
parts: list[str] = []
|
|
||||||
used_pages: set[int] = set()
|
|
||||||
|
|
||||||
for item in catalog:
|
|
||||||
start = max(0, min(int(item["page_start_index"]), num_pages - 1))
|
|
||||||
end = max(start, min(int(item["page_end_index"]), num_pages - 1))
|
|
||||||
|
|
||||||
chapter_parts = [f"# {item['title']}\n"]
|
|
||||||
for idx in range(start, end + 1):
|
|
||||||
if idx < len(page_mds) and page_mds[idx].strip() and idx not in used_pages:
|
|
||||||
chapter_parts.append(page_mds[idx])
|
|
||||||
used_pages.add(idx)
|
|
||||||
|
|
||||||
if len(chapter_parts) > 1:
|
|
||||||
parts.append("\n\n".join(chapter_parts))
|
|
||||||
|
|
||||||
if parts:
|
|
||||||
return "\n\n---\n\n".join(parts)
|
|
||||||
return "\n\n---\n\n".join(m for m in page_mds if m.strip())
|
|
||||||
|
|
||||||
@staticmethod
|
|
||||||
def _rects_overlap(block_rect: fitz.Rect, table_rect: fitz.Rect) -> bool:
|
|
||||||
inter = block_rect & table_rect
|
|
||||||
if inter.is_empty:
|
|
||||||
return False
|
|
||||||
block_area = block_rect.width * block_rect.height
|
|
||||||
if block_area <= 0:
|
|
||||||
return False
|
|
||||||
return (inter.width * inter.height) / block_area >= 0.3
|
|
||||||
|
|
||||||
@staticmethod
|
|
||||||
def _cells_to_md_table(cells: list) -> str:
|
|
||||||
if not cells:
|
|
||||||
return ""
|
|
||||||
|
|
||||||
header = cells[0]
|
|
||||||
ncols = len(header)
|
|
||||||
if ncols == 0:
|
|
||||||
return ""
|
|
||||||
|
|
||||||
def clean(value: Any) -> str:
|
|
||||||
return str(value or "").replace("|", "\\|").replace("\n", " ").strip()
|
|
||||||
|
|
||||||
lines = [
|
|
||||||
"| " + " | ".join(clean(cell) for cell in header) + " |",
|
|
||||||
"| " + " | ".join("---" for _ in range(ncols)) + " |",
|
|
||||||
]
|
|
||||||
for row in cells[1:]:
|
|
||||||
padded = list(row) + [""] * max(0, ncols - len(row))
|
|
||||||
lines.append("| " + " | ".join(clean(cell) for cell in padded[:ncols]) + " |")
|
|
||||||
return "\n".join(lines)
|
|
||||||
|
|
||||||
@staticmethod
|
|
||||||
def _to_int(value: Any, default: int | None) -> int | None:
|
|
||||||
try:
|
|
||||||
if value is None or value == "":
|
|
||||||
return default
|
|
||||||
return int(value)
|
|
||||||
except Exception:
|
|
||||||
return default
|
|
||||||
@@ -1,51 +0,0 @@
|
|||||||
identity:
|
|
||||||
name: "pdf_to_markdown"
|
|
||||||
author: "yslg"
|
|
||||||
label:
|
|
||||||
en_US: "PDF to Markdown"
|
|
||||||
zh_Hans: "PDF to Markdown"
|
|
||||||
pt_BR: "PDF para Markdown"
|
|
||||||
ja_JP: "PDF to Markdown"
|
|
||||||
description:
|
|
||||||
human:
|
|
||||||
en_US: "Convert PDF to Markdown using a catalog array. Images and graphics are ignored."
|
|
||||||
zh_Hans: "Convert PDF to Markdown using a catalog array. Images and graphics are ignored."
|
|
||||||
pt_BR: "Convert PDF to Markdown using a catalog array. Images and graphics are ignored."
|
|
||||||
ja_JP: "Convert PDF to Markdown using a catalog array. Images and graphics are ignored."
|
|
||||||
llm: "Convert a PDF file into Markdown using a catalog JSON array. Ignore images and graphics."
|
|
||||||
parameters:
|
|
||||||
- name: file
|
|
||||||
type: file
|
|
||||||
required: true
|
|
||||||
label:
|
|
||||||
en_US: PDF File
|
|
||||||
zh_Hans: PDF File
|
|
||||||
pt_BR: PDF File
|
|
||||||
ja_JP: PDF File
|
|
||||||
human_description:
|
|
||||||
en_US: "PDF file to convert"
|
|
||||||
zh_Hans: "PDF file to convert"
|
|
||||||
pt_BR: "PDF file to convert"
|
|
||||||
ja_JP: "PDF file to convert"
|
|
||||||
llm_description: "PDF file to convert to Markdown"
|
|
||||||
form: llm
|
|
||||||
fileTypes:
|
|
||||||
- "pdf"
|
|
||||||
- name: catalog
|
|
||||||
type: string
|
|
||||||
required: true
|
|
||||||
label:
|
|
||||||
en_US: Catalog JSON
|
|
||||||
zh_Hans: Catalog JSON
|
|
||||||
pt_BR: Catalog JSON
|
|
||||||
ja_JP: Catalog JSON
|
|
||||||
human_description:
|
|
||||||
en_US: "Catalog JSON array like [{title,start,end,page_start_index,page_end_index}]"
|
|
||||||
zh_Hans: "Catalog JSON array like [{title,start,end,page_start_index,page_end_index}]"
|
|
||||||
pt_BR: "Catalog JSON array like [{title,start,end,page_start_index,page_end_index}]"
|
|
||||||
ja_JP: "Catalog JSON array like [{title,start,end,page_start_index,page_end_index}]"
|
|
||||||
llm_description: "Catalog JSON array returned by pdf_toc"
|
|
||||||
form: llm
|
|
||||||
extra:
|
|
||||||
python:
|
|
||||||
source: tools/pdf_to_markdown.py
|
|
||||||
@@ -1,312 +0,0 @@
|
|||||||
import json
|
|
||||||
import re
|
|
||||||
from collections import OrderedDict
|
|
||||||
from collections.abc import Generator
|
|
||||||
from typing import Any
|
|
||||||
|
|
||||||
import fitz
|
|
||||||
from dify_plugin import Tool
|
|
||||||
from dify_plugin.entities.model.llm import LLMModelConfig
|
|
||||||
from dify_plugin.entities.model.message import SystemPromptMessage, UserPromptMessage
|
|
||||||
from dify_plugin.entities.tool import ToolInvokeMessage
|
|
||||||
|
|
||||||
_TOC_SYSTEM_PROMPT = """你是专业的PDF目录解析助手。请从以下PDF文本中提取文档的目录/章节结构。
|
|
||||||
|
|
||||||
要求:
|
|
||||||
1. 识别所有一级和二级标题及其对应的页码
|
|
||||||
2. 只返回纯JSON数组,不要markdown代码块,不要任何解释
|
|
||||||
3. 格式: [{"title": "章节标题", "page": 页码数字}]
|
|
||||||
4. 页码必须是文档中标注的实际页码数字
|
|
||||||
5. 如果无法识别目录,返回空数组 []"""
|
|
||||||
|
|
||||||
|
|
||||||
class PdfTocTool(Tool):
|
|
||||||
_TOC_PATTERNS = [
|
|
||||||
r"目录",
|
|
||||||
r"目\s*录",
|
|
||||||
r"目\u3000录",
|
|
||||||
r"Table of Contents",
|
|
||||||
r"Contents",
|
|
||||||
r"目次",
|
|
||||||
]
|
|
||||||
|
|
||||||
def _invoke(self, tool_parameters: dict[str, Any]) -> Generator[ToolInvokeMessage]:
|
|
||||||
file = tool_parameters.get("file")
|
|
||||||
if not file:
|
|
||||||
yield self.create_text_message("Error: file is required")
|
|
||||||
return
|
|
||||||
|
|
||||||
model_config = tool_parameters.get("model")
|
|
||||||
|
|
||||||
doc = fitz.open(stream=file.blob, filetype="pdf")
|
|
||||||
try:
|
|
||||||
num_pages = len(doc)
|
|
||||||
|
|
||||||
# 1) 优先从PDF元数据提取目录
|
|
||||||
catalog = self._catalog_from_metadata(doc.get_toc(), num_pages)
|
|
||||||
|
|
||||||
# 2) 元数据无目录时,使用LLM解析
|
|
||||||
if not catalog and model_config:
|
|
||||||
catalog = self._extract_toc_with_llm(doc, num_pages, model_config)
|
|
||||||
|
|
||||||
# 3) 无LLM配置时回退到正则解析
|
|
||||||
if not catalog:
|
|
||||||
toc_start, toc_end = self._find_toc_pages(doc, num_pages)
|
|
||||||
if toc_start is not None and toc_end is not None:
|
|
||||||
toc_text = "\n".join(
|
|
||||||
doc[index].get_text() or "" for index in range(toc_start, toc_end + 1)
|
|
||||||
)
|
|
||||||
printed_catalog = self._parse_toc_lines(toc_text)
|
|
||||||
catalog = self._attach_page_indexes(printed_catalog, toc_end, num_pages)
|
|
||||||
|
|
||||||
if not catalog:
|
|
||||||
catalog = []
|
|
||||||
|
|
||||||
yield self.create_text_message(json.dumps(catalog, ensure_ascii=False))
|
|
||||||
finally:
|
|
||||||
doc.close()
|
|
||||||
|
|
||||||
def _extract_toc_with_llm(
|
|
||||||
self, doc: fitz.Document, num_pages: int, model_config: dict[str, Any]
|
|
||||||
) -> list[dict[str, int | str]]:
|
|
||||||
# 先尝试定位目录页
|
|
||||||
toc_start, toc_end = self._find_toc_pages(doc, num_pages)
|
|
||||||
|
|
||||||
if toc_start is not None and toc_end is not None:
|
|
||||||
# 有目录页,提取目录页文本
|
|
||||||
toc_text = "\n".join(
|
|
||||||
doc[index].get_text() or "" for index in range(toc_start, toc_end + 1)
|
|
||||||
)
|
|
||||||
content_offset = toc_end
|
|
||||||
else:
|
|
||||||
# 无目录页,提取前15页文本让LLM识别章节结构
|
|
||||||
sample = min(num_pages, 15)
|
|
||||||
toc_text = "\n\n--- 第{}页 ---\n".join(
|
|
||||||
[""] + [doc[i].get_text() or "" for i in range(sample)]
|
|
||||||
)
|
|
||||||
toc_text = toc_text.strip()
|
|
||||||
if not toc_text:
|
|
||||||
return []
|
|
||||||
content_offset = 0
|
|
||||||
|
|
||||||
# 截断过长文本
|
|
||||||
if len(toc_text) > 15000:
|
|
||||||
toc_text = toc_text[:15000] + "\n...[截断]"
|
|
||||||
|
|
||||||
try:
|
|
||||||
response = self.session.model.llm.invoke(
|
|
||||||
model_config=LLMModelConfig(**model_config),
|
|
||||||
prompt_messages=[
|
|
||||||
SystemPromptMessage(content=_TOC_SYSTEM_PROMPT),
|
|
||||||
UserPromptMessage(content=toc_text),
|
|
||||||
],
|
|
||||||
stream=False,
|
|
||||||
)
|
|
||||||
|
|
||||||
llm_text = self._get_response_text(response)
|
|
||||||
if not llm_text:
|
|
||||||
return []
|
|
||||||
|
|
||||||
raw_catalog = self._parse_llm_json(llm_text)
|
|
||||||
if not raw_catalog:
|
|
||||||
return []
|
|
||||||
|
|
||||||
# 转换LLM返回的简单格式为完整catalog
|
|
||||||
return self._build_catalog_from_llm(raw_catalog, content_offset, num_pages)
|
|
||||||
except Exception:
|
|
||||||
return []
|
|
||||||
|
|
||||||
def _build_catalog_from_llm(
|
|
||||||
self, raw: list[dict], content_offset: int, num_pages: int
|
|
||||||
) -> list[dict[str, int | str]]:
|
|
||||||
entries: list[tuple[str, int]] = []
|
|
||||||
for item in raw:
|
|
||||||
title = str(item.get("title") or "").strip()
|
|
||||||
page = self._to_int(item.get("page"), None)
|
|
||||||
if not title or page is None:
|
|
||||||
continue
|
|
||||||
entries.append((title, page))
|
|
||||||
|
|
||||||
if not entries:
|
|
||||||
return []
|
|
||||||
|
|
||||||
# 计算偏移量:第一个条目的页码与实际内容起始页的差值
|
|
||||||
first_printed_page = entries[0][1]
|
|
||||||
offset = (content_offset + 1) - first_printed_page if content_offset > 0 else 0
|
|
||||||
|
|
||||||
result: list[dict[str, int | str]] = []
|
|
||||||
for i, (title, page) in enumerate(entries):
|
|
||||||
next_page = entries[i + 1][1] if i + 1 < len(entries) else page
|
|
||||||
page_start_index = max(0, min(page + offset - 1, num_pages - 1))
|
|
||||||
page_end_index = max(page_start_index, min(next_page + offset - 2, num_pages - 1))
|
|
||||||
if i == len(entries) - 1:
|
|
||||||
page_end_index = num_pages - 1
|
|
||||||
|
|
||||||
result.append({
|
|
||||||
"title": title,
|
|
||||||
"start": page,
|
|
||||||
"end": max(page, next_page - 1) if i + 1 < len(entries) else page,
|
|
||||||
"page_start_index": page_start_index,
|
|
||||||
"page_end_index": page_end_index,
|
|
||||||
})
|
|
||||||
|
|
||||||
return result
|
|
||||||
|
|
||||||
@staticmethod
|
|
||||||
def _get_response_text(response: Any) -> str:
|
|
||||||
if not hasattr(response, "message") or not response.message:
|
|
||||||
return ""
|
|
||||||
content = response.message.content
|
|
||||||
if isinstance(content, str):
|
|
||||||
text = content
|
|
||||||
elif isinstance(content, list):
|
|
||||||
text = "".join(
|
|
||||||
item.data if hasattr(item, "data") else str(item) for item in content
|
|
||||||
)
|
|
||||||
else:
|
|
||||||
text = str(content)
|
|
||||||
|
|
||||||
# 清理思考标签
|
|
||||||
text = re.sub(r"<think>[\s\S]*?</think>", "", text, flags=re.IGNORECASE)
|
|
||||||
text = re.sub(r"<\|[^>]+\|>", "", text)
|
|
||||||
return text.strip()
|
|
||||||
|
|
||||||
@staticmethod
|
|
||||||
def _parse_llm_json(text: str) -> list[dict]:
|
|
||||||
# 尝试提取JSON代码块
|
|
||||||
code_match = re.search(r"```(?:json)?\s*([\s\S]*?)```", text)
|
|
||||||
if code_match:
|
|
||||||
text = code_match.group(1).strip()
|
|
||||||
|
|
||||||
# 尝试找到JSON数组
|
|
||||||
bracket_match = re.search(r"\[[\s\S]*\]", text)
|
|
||||||
if bracket_match:
|
|
||||||
text = bracket_match.group(0)
|
|
||||||
|
|
||||||
try:
|
|
||||||
result = json.loads(text)
|
|
||||||
if isinstance(result, list):
|
|
||||||
return result
|
|
||||||
except Exception:
|
|
||||||
pass
|
|
||||||
return []
|
|
||||||
|
|
||||||
def _catalog_from_metadata(self, toc: list, num_pages: int) -> list[dict[str, int | str]]:
|
|
||||||
top = [(title, max(0, page - 1)) for level, title, page in toc if level <= 2 and page >= 1]
|
|
||||||
if not top:
|
|
||||||
return []
|
|
||||||
|
|
||||||
result: list[dict[str, int | str]] = []
|
|
||||||
for index, (title, start_index) in enumerate(top):
|
|
||||||
end_index = top[index + 1][1] - 1 if index + 1 < len(top) else num_pages - 1
|
|
||||||
result.append({
|
|
||||||
"title": title,
|
|
||||||
"start": start_index + 1,
|
|
||||||
"end": max(start_index, end_index) + 1,
|
|
||||||
"page_start_index": start_index,
|
|
||||||
"page_end_index": max(start_index, end_index),
|
|
||||||
})
|
|
||||||
return result
|
|
||||||
|
|
||||||
def _find_toc_pages(self, doc: fitz.Document, num_pages: int) -> tuple[int | None, int | None]:
|
|
||||||
toc_start = None
|
|
||||||
toc_end = None
|
|
||||||
for page_number in range(min(num_pages, 30)):
|
|
||||||
text = doc[page_number].get_text() or ""
|
|
||||||
if any(re.search(pattern, text, re.IGNORECASE) for pattern in self._TOC_PATTERNS):
|
|
||||||
if toc_start is None:
|
|
||||||
toc_start = page_number
|
|
||||||
toc_end = page_number
|
|
||||||
elif toc_start is not None:
|
|
||||||
break
|
|
||||||
return toc_start, toc_end
|
|
||||||
|
|
||||||
def _parse_toc_lines(self, text: str) -> list[dict[str, int | str]]:
|
|
||||||
marker = re.search(
|
|
||||||
r"^(List\s+of\s+Figures|List\s+of\s+Tables|图目录|表目录)",
|
|
||||||
text,
|
|
||||||
re.IGNORECASE | re.MULTILINE,
|
|
||||||
)
|
|
||||||
if marker:
|
|
||||||
text = text[: marker.start()]
|
|
||||||
|
|
||||||
pattern = re.compile(r"^\s*(?P<title>.+?)\s*(?:\.{2,}|\s)\s*(?P<page>\d{1,5})\s*$")
|
|
||||||
entries: list[tuple[str, int]] = []
|
|
||||||
for raw in text.splitlines():
|
|
||||||
line = raw.strip()
|
|
||||||
if not line or len(line) < 3 or re.fullmatch(r"\d+", line):
|
|
||||||
continue
|
|
||||||
|
|
||||||
match = pattern.match(line)
|
|
||||||
if not match:
|
|
||||||
continue
|
|
||||||
|
|
||||||
title = re.sub(r"\s+", " ", match.group("title")).strip("-_::")
|
|
||||||
page = self._to_int(match.group("page"), None)
|
|
||||||
if not title or page is None or len(title) <= 1:
|
|
||||||
continue
|
|
||||||
if title.lower() in {"page", "pages", "目录", "contents"}:
|
|
||||||
continue
|
|
||||||
|
|
||||||
entries.append((title, page))
|
|
||||||
|
|
||||||
if not entries:
|
|
||||||
return []
|
|
||||||
|
|
||||||
dedup: OrderedDict[str, int] = OrderedDict()
|
|
||||||
for title, page in entries:
|
|
||||||
dedup.setdefault(title, page)
|
|
||||||
|
|
||||||
titles = list(dedup.keys())
|
|
||||||
pages = [dedup[title] for title in titles]
|
|
||||||
result: list[dict[str, int | str]] = []
|
|
||||||
for index, title in enumerate(titles):
|
|
||||||
start = pages[index]
|
|
||||||
end = max(start, pages[index + 1] - 1) if index + 1 < len(pages) else start
|
|
||||||
result.append({"title": title, "start": start, "end": end})
|
|
||||||
return result
|
|
||||||
|
|
||||||
def _attach_page_indexes(
|
|
||||||
self, catalog: list[dict[str, int | str]], toc_end: int, num_pages: int
|
|
||||||
) -> list[dict[str, int | str]]:
|
|
||||||
if not catalog:
|
|
||||||
return []
|
|
||||||
|
|
||||||
first_page = None
|
|
||||||
for item in catalog:
|
|
||||||
start = self._to_int(item.get("start"), None)
|
|
||||||
if start is not None and (first_page is None or start < first_page):
|
|
||||||
first_page = start
|
|
||||||
|
|
||||||
if first_page is None:
|
|
||||||
return []
|
|
||||||
|
|
||||||
offset = (toc_end + 1) - first_page
|
|
||||||
result: list[dict[str, int | str]] = []
|
|
||||||
for item in catalog:
|
|
||||||
start = self._to_int(item.get("start"), None)
|
|
||||||
end = self._to_int(item.get("end"), start)
|
|
||||||
if start is None:
|
|
||||||
continue
|
|
||||||
if end is None:
|
|
||||||
end = start
|
|
||||||
|
|
||||||
page_start_index = max(0, min(start + offset, num_pages - 1))
|
|
||||||
page_end_index = max(page_start_index, min(end + offset, num_pages - 1))
|
|
||||||
result.append({
|
|
||||||
"title": str(item.get("title") or "Untitled"),
|
|
||||||
"start": start,
|
|
||||||
"end": max(start, end),
|
|
||||||
"page_start_index": page_start_index,
|
|
||||||
"page_end_index": page_end_index,
|
|
||||||
})
|
|
||||||
return result
|
|
||||||
|
|
||||||
@staticmethod
|
|
||||||
def _to_int(value: Any, default: int | None) -> int | None:
|
|
||||||
try:
|
|
||||||
if value is None or value == "":
|
|
||||||
return default
|
|
||||||
return int(value)
|
|
||||||
except Exception:
|
|
||||||
return default
|
|
||||||
@@ -1,51 +0,0 @@
|
|||||||
identity:
|
|
||||||
name: "pdf_toc"
|
|
||||||
author: "yslg"
|
|
||||||
label:
|
|
||||||
en_US: "PDF TOC"
|
|
||||||
zh_Hans: "PDF 目录提取"
|
|
||||||
pt_BR: "PDF TOC"
|
|
||||||
ja_JP: "PDF TOC"
|
|
||||||
description:
|
|
||||||
human:
|
|
||||||
en_US: "Extract the catalog array from a PDF file using metadata or LLM."
|
|
||||||
zh_Hans: "从PDF文件中提取目录数组,优先使用元数据,回退使用LLM解析。"
|
|
||||||
pt_BR: "Extrair o array de catálogo de um arquivo PDF."
|
|
||||||
ja_JP: "PDFファイルからカタログ配列を抽出する。"
|
|
||||||
llm: "Extract a catalog array from a PDF file. Returns JSON text like [{title,start,end,page_start_index,page_end_index}]."
|
|
||||||
parameters:
|
|
||||||
- name: file
|
|
||||||
type: file
|
|
||||||
required: true
|
|
||||||
label:
|
|
||||||
en_US: PDF File
|
|
||||||
zh_Hans: PDF 文件
|
|
||||||
pt_BR: PDF File
|
|
||||||
ja_JP: PDF File
|
|
||||||
human_description:
|
|
||||||
en_US: "PDF file to inspect"
|
|
||||||
zh_Hans: "要解析的PDF文件"
|
|
||||||
pt_BR: "PDF file to inspect"
|
|
||||||
ja_JP: "PDF file to inspect"
|
|
||||||
llm_description: "PDF file to extract catalog from"
|
|
||||||
form: llm
|
|
||||||
fileTypes:
|
|
||||||
- "pdf"
|
|
||||||
- name: model
|
|
||||||
type: model-selector
|
|
||||||
scope: llm
|
|
||||||
required: true
|
|
||||||
label:
|
|
||||||
en_US: LLM Model
|
|
||||||
zh_Hans: LLM 模型
|
|
||||||
pt_BR: Modelo LLM
|
|
||||||
ja_JP: LLMモデル
|
|
||||||
human_description:
|
|
||||||
en_US: "LLM model used for parsing TOC when metadata is unavailable"
|
|
||||||
zh_Hans: "当元数据不可用时,用于解析目录的LLM模型"
|
|
||||||
pt_BR: "Modelo LLM para análise de TOC"
|
|
||||||
ja_JP: "メタデータが利用できない場合のTOC解析用LLMモデル"
|
|
||||||
form: form
|
|
||||||
extra:
|
|
||||||
python:
|
|
||||||
source: tools/pdf_toc.py
|
|
||||||
File diff suppressed because it is too large
Load Diff
@@ -1,122 +0,0 @@
|
|||||||
# Dify 插件服务需求文档
|
|
||||||
|
|
||||||
## 1. 项目概述
|
|
||||||
|
|
||||||
开发一个基于 FastAPI 框架的 Dify 插件服务,实现与 Dify 平台的集成,支持多种插件的部署和管理,提供各种功能扩展。
|
|
||||||
|
|
||||||
## 2. 技术栈
|
|
||||||
|
|
||||||
- **框架**:FastAPI
|
|
||||||
- **语言**:Python 3.9+
|
|
||||||
- **依赖管理**:Poetry 或 Pip
|
|
||||||
- **部署方式**:Docker 容器化
|
|
||||||
|
|
||||||
## 3. 项目架构
|
|
||||||
|
|
||||||
### 3.1 架构设计
|
|
||||||
- **插件管理系统**:统一管理多个 Dify 插件
|
|
||||||
- **插件加载机制**:支持动态加载和热更新插件
|
|
||||||
- **插件隔离**:每个插件运行在独立的环境中
|
|
||||||
- **API 网关**:统一的 API 入口,路由到对应插件
|
|
||||||
|
|
||||||
### 3.2 目录结构
|
|
||||||
```
|
|
||||||
difyPlugin/
|
|
||||||
├── main.py # 应用入口
|
|
||||||
├── requirements.txt # 依赖管理
|
|
||||||
├── .env # 环境配置
|
|
||||||
├── app/
|
|
||||||
│ ├── api/ # API 路由
|
|
||||||
│ ├── core/ # 核心配置
|
|
||||||
│ ├── plugins/ # 插件目录
|
|
||||||
│ │ ├── plugin1/ # 插件1
|
|
||||||
│ │ ├── plugin2/ # 插件2
|
|
||||||
│ │ └── __init__.py # 插件加载器
|
|
||||||
│ └── services/ # 公共服务
|
|
||||||
└── tests/ # 测试目录
|
|
||||||
```
|
|
||||||
|
|
||||||
### 3.3 插件规范
|
|
||||||
- **插件结构**:每个插件包含独立的配置、逻辑和 API
|
|
||||||
- **插件接口**:统一的插件接口规范
|
|
||||||
- **插件注册**:自动发现和注册插件
|
|
||||||
- **插件生命周期**:支持插件的启动、停止和重启
|
|
||||||
|
|
||||||
## 4. 核心功能
|
|
||||||
|
|
||||||
### 4.1 基础功能
|
|
||||||
- **健康检查**:提供服务状态检查接口
|
|
||||||
- **版本管理**:支持插件版本控制
|
|
||||||
- **认证机制**:实现与 Dify 的安全认证
|
|
||||||
- **插件管理**:支持插件的注册、启动、停止和卸载
|
|
||||||
|
|
||||||
### 4.2 业务功能
|
|
||||||
- **数据处理**:支持各种数据格式的转换和处理
|
|
||||||
- **外部 API 集成**:对接第三方服务的 API
|
|
||||||
- **自定义逻辑**:支持用户自定义业务逻辑
|
|
||||||
- **事件处理**:响应 Dify 平台的事件触发
|
|
||||||
|
|
||||||
## 5. 接口设计
|
|
||||||
|
|
||||||
### 5.1 主要接口
|
|
||||||
- `GET /health`:健康检查
|
|
||||||
- `GET /api/v1/plugins`:获取插件列表
|
|
||||||
- `GET /api/v1/plugins/{plugin_id}`:获取插件详情
|
|
||||||
- `POST /api/v1/plugins/{plugin_id}/execute`:执行插件功能
|
|
||||||
- `GET /api/v1/plugins/{plugin_id}/metadata`:获取插件元数据
|
|
||||||
- `POST /api/v1/plugins/{plugin_id}/start`:启动插件
|
|
||||||
- `POST /api/v1/plugins/{plugin_id}/stop`:停止插件
|
|
||||||
|
|
||||||
### 5.2 请求/响应格式
|
|
||||||
- **请求格式**:JSON
|
|
||||||
- **响应格式**:JSON,包含状态码和数据
|
|
||||||
|
|
||||||
## 6. 部署要求
|
|
||||||
|
|
||||||
- **环境变量**:支持通过环境变量配置服务参数
|
|
||||||
- **日志管理**:集成结构化日志
|
|
||||||
- **监控指标**:提供 Prometheus 指标接口
|
|
||||||
- **错误处理**:完善的错误处理和异常捕获
|
|
||||||
- **插件隔离**:支持插件的独立部署和隔离
|
|
||||||
|
|
||||||
## 7. 集成方式
|
|
||||||
|
|
||||||
- **Dify 插件注册**:按照 Dify 插件规范注册
|
|
||||||
- **Webhook 配置**:支持 Dify 平台的 Webhook 回调
|
|
||||||
- **事件订阅**:订阅 Dify 平台的事件
|
|
||||||
- **插件发现**:自动发现和注册新插件
|
|
||||||
|
|
||||||
## 8. 开发计划
|
|
||||||
|
|
||||||
### 8.1 阶段一:项目初始化
|
|
||||||
- 创建 FastAPI 项目结构
|
|
||||||
- 配置依赖管理
|
|
||||||
- 实现插件管理系统
|
|
||||||
|
|
||||||
### 8.2 阶段二:核心功能开发
|
|
||||||
- 实现插件加载机制
|
|
||||||
- 开发插件接口规范
|
|
||||||
- 实现数据处理功能
|
|
||||||
- 集成外部 API
|
|
||||||
|
|
||||||
### 8.3 阶段三:测试与部署
|
|
||||||
- 编写单元测试
|
|
||||||
- 集成测试
|
|
||||||
- 容器化部署
|
|
||||||
- 插件示例开发
|
|
||||||
|
|
||||||
## 9. 技术要求
|
|
||||||
|
|
||||||
- **代码质量**:遵循 PEP 8 编码规范
|
|
||||||
- **文档**:完善的 API 文档
|
|
||||||
- **性能**:优化响应速度和资源占用
|
|
||||||
- **安全**:实现安全的认证和授权机制
|
|
||||||
- **可扩展性**:支持插件的动态添加和移除
|
|
||||||
|
|
||||||
## 10. 交付物
|
|
||||||
|
|
||||||
- **源代码**:完整的项目代码
|
|
||||||
- **部署文档**:详细的部署步骤
|
|
||||||
- **API 文档**:自动生成的 API 文档
|
|
||||||
- **测试报告**:测试结果和覆盖率报告
|
|
||||||
- **插件开发指南**:插件开发和注册指南
|
|
||||||
Binary file not shown.
Reference in New Issue
Block a user