搜索关键字爬虫

This commit is contained in:
2025-11-12 16:10:34 +08:00
parent 7be02fe396
commit 675e6da7d7
37 changed files with 3382 additions and 572 deletions

7
output/out.json Normal file
View File

@@ -0,0 +1,7 @@
{
"code": "0",
"message": "获取搜索结果失败",
"success": false,
"data": null,
"dataList": []
}

324
output/output.json Normal file
View File

@@ -0,0 +1,324 @@
{
"code": 0,
"message": "",
"success": true,
"data": null,
"dataList": [
{
"title": "",
"contentRows": [],
"url": "http://cpc.people.com.cn/n1/2025/1109/c435113-40599647.html",
"publishTime": "",
"author": "",
"source": "人民网",
"category": ""
},
{
"title": "习近平在广东考察",
"contentRows": [
{
"tag": "p",
"content": "<p></p>"
},
{
"tag": "img",
"content": "<img style='None' src='http://www.people.com.cn/mediafile/pic/BIG/20251108/12/10441932996427049992.jpg' />"
},
{
"tag": "p",
"content": "<p>  11月7日至8日中共中央总书记、国家主席、中央军委主席习近平在广东考察。这是7日下午习近平在位于梅州市梅县区雁洋镇的叶剑英纪念馆参观叶剑英生平事迹陈列。</p>"
},
{
"tag": "p",
"content": "<p>  新华社记者 谢环驰 摄</p>"
},
{
"tag": "img",
"content": "<img style='None' src='http://www.people.com.cn/img/2020wbc/imgs/share.png' />"
}
],
"url": "http://pic.people.com.cn/n1/2025/1108/c426981-40599554.html",
"publishTime": "2025年11月08日17:22",
"author": "",
"source": "新华社",
"category": ""
},
{
"title": "",
"contentRows": [],
"url": "http://cpc.people.com.cn/n1/2025/1031/c64094-40593715.html",
"publishTime": "",
"author": "",
"source": "人民网",
"category": ""
},
{
"title": "习近平抵达韩国",
"contentRows": [
{
"tag": "p",
"content": "<p></p>"
},
{
"tag": "img",
"content": "<img style='text-align: center;' src='http://www.people.com.cn/mediafile/pic/20251031/24/17044241366860047372.jpg' />"
},
{
"tag": "p",
"content": "<p style=\"text-align: center;\"><span style=\"color: #0000cd;\">当地时间十月三十日上午,国家主席习近平乘专机抵达韩国,应大韩民国总统李在明邀请,出席亚太经合组织第三十二次领导人非正式会议并对韩国进行国事访问。这是习近平抵达釜山金海国际机场时,韩国外长赵显等高级官员热情迎接。新华社记者 黄敬文摄</span></p>"
},
{
"tag": "p",
"content": "<p style=\"text-align: justify;\">  本报韩国釜山10月30日电 记者莽九晨、杨翘楚当地时间10月30日上午国家主席习近平乘专机抵达韩国应大韩民国总统李在明邀请出席亚太经合组织第三十二次领导人非正式会议并对韩国进行国事访问。</p>"
},
{
"tag": "p",
"content": "<p style=\"text-align: justify;\">  习近平抵达釜山金海国际机场时韩国外长赵显等高级官员热情迎接。礼兵分列红地毯两侧致敬军乐团演奏行进乐机场鸣放21响礼炮。</p>"
},
{
"tag": "p",
"content": "<p style=\"text-align: justify;\">  蔡奇、王毅、何立峰等陪同人员同机抵达。</p>"
},
{
"tag": "p",
"content": "<p style=\"text-align: justify;\">  先期抵达的香港特别行政区行政长官李家超、中国驻韩国大使戴兵也到机场迎接。</p>"
},
{
"tag": "p",
"content": "<p style=\"text-align: justify;\">  中国留学生和中资企业代表挥舞中韩两国国旗,热烈欢迎习近平到访。</p>"
},
{
"tag": "p",
"content": "<p style=\"text-align: justify;\">  本报北京10月30日电 10月30日上午国家主席习近平乘专机离开北京应大韩民国总统李在明邀请赴韩国庆州出席亚太经合组织第三十二次领导人非正式会议并对韩国进行国事访问。</p>"
},
{
"tag": "p",
"content": "<p style=\"text-align: justify;\">  陪同习近平出访的有:中共中央政治局常委、中央办公厅主任蔡奇,中共中央政治局委员、外交部部长王毅,中共中央政治局委员、国务院副总理何立峰等。</p>"
},
{
"tag": "p",
"content": "<p style=\"text-align: justify;\">  《人民日报》2025年10月31日 第01版</p>"
},
{
"tag": "img",
"content": "<img style='None' src='http://www.people.com.cn/img/2020wbc/imgs/share.png' />"
}
],
"url": "http://korea.people.com.cn/n1/2025/1031/c407366-40594082.html",
"publishTime": "2025年10月31日13:38",
"author": "",
"source": "人民网-人民日报",
"category": ""
},
{
"title": "习近平抵达韩国",
"contentRows": [
{
"tag": "p",
"content": "<p></p>"
},
{
"tag": "p",
"content": "<p>  当地时间十月三十日上午,国家主席习近平乘专机抵达韩国,应大韩民国总统李在明邀请,出席亚太经合组织第三十二次领导人非正式会议并对韩国进行国事访问。这是习近平抵达釜山金海国际机场时,韩国外长赵显等高级官员热情迎接。<br/>  新华社记者 黄敬文摄</p>"
},
{
"tag": "p",
"content": "<p>   本报韩国釜山10月30日电  记者莽九晨、杨翘楚当地时间10月30日上午国家主席习近平乘专机抵达韩国应大韩民国总统李在明邀请出席亚太经合组织第三十二次领导人非正式会议并对韩国进行国事访问。</p>"
},
{
"tag": "p",
"content": "<p>  习近平抵达釜山金海国际机场时韩国外长赵显等高级官员热情迎接。礼兵分列红地毯两侧致敬军乐团演奏行进乐机场鸣放21响礼炮。</p>"
},
{
"tag": "p",
"content": "<p>  蔡奇、王毅、何立峰等陪同人员同机抵达。</p>"
},
{
"tag": "p",
"content": "<p>  先期抵达的香港特别行政区行政长官李家超、中国驻韩国大使戴兵也到机场迎接。</p>"
},
{
"tag": "p",
"content": "<p>  中国留学生和中资企业代表挥舞中韩两国国旗,热烈欢迎习近平到访。</p>"
},
{
"tag": "p",
"content": "<p>  本报北京10月30日电  10月30日上午国家主席习近平乘专机离开北京应大韩民国总统李在明邀请赴韩国庆州出席亚太经合组织第三十二次领导人非正式会议并对韩国进行国事访问。</p>"
},
{
"tag": "p",
"content": "<p>  陪同习近平出访的有:中共中央政治局常委、中央办公厅主任蔡奇,中共中央政治局委员、外交部部长王毅,中共中央政治局委员、国务院副总理何立峰等。 </p>"
},
{
"tag": "p",
"content": "<p></p>"
},
{
"tag": "p",
"content": "<p><span id=\"paper_num\">  《 人民日报 》( 2025年10月31日 01 版)</span></p>"
},
{
"tag": "img",
"content": "<img style='None' src='http://www.people.com.cn/img/2020wbc/imgs/share.png' />"
}
],
"url": "http://politics.people.com.cn/n1/2025/1031/c1024-40593454.html",
"publishTime": "2025年10月31日06:10",
"author": "",
"source": "人民网-人民日报",
"category": ""
},
{
"title": "习近平回到北京",
"contentRows": [
{
"tag": "p",
"content": "<p></p>"
},
{
"tag": "p",
"content": "<p style=\"text-indent: 2em;\">本报北京11月1日电  11月1日晚国家主席习近平结束出席亚太经合组织第三十二次领导人非正式会议和对韩国的国事访问后回到北京。</p>"
},
{
"tag": "p",
"content": "<p style=\"text-indent: 2em;\">中共中央政治局常委、中央办公厅主任蔡奇,中共中央政治局委员、外交部部长王毅等陪同人员同机返回。</p>"
},
{
"tag": "p",
"content": "<p style=\"text-indent: 2em;\">本报韩国釜山11月1日电  记者王嵘、朱笑熺当地时间11月1日晚国家主席习近平结束出席亚太经合组织第三十二次领导人非正式会议和对韩国的国事访问返回北京。</p>"
},
{
"tag": "p",
"content": "<p style=\"text-indent: 2em;\">离开釜山时,韩国外长赵显等高级官员到机场送行。</p>"
},
{
"tag": "p",
"content": "<p style=\"text-indent: 2em;\">前往机场途中,中国留学生和中资企业代表在道路两旁挥舞中韩两国国旗,热烈祝贺习近平主席访问圆满成功。</p>"
},
{
"tag": "img",
"content": "<img style='None' src='http://www.people.com.cn/img/2020wbc/imgs/share.png' />"
}
],
"url": "http://gd.people.com.cn/n2/2025/1102/c123932-41398959.html",
"publishTime": "2025年11月02日11:15",
"author": "",
"source": "人民网-人民日报",
"category": ""
},
{
"title": "习近平回到北京",
"contentRows": [
{
"tag": "p",
"content": "<p></p>"
},
{
"tag": "p",
"content": "<p>   本报北京11月1日电  11月1日晚国家主席习近平结束出席亚太经合组织第三十二次领导人非正式会议和对韩国的国事访问后回到北京。</p>"
},
{
"tag": "p",
"content": "<p>  中共中央政治局常委、中央办公厅主任蔡奇,中共中央政治局委员、外交部部长王毅等陪同人员同机返回。</p>"
},
{
"tag": "p",
"content": "<p>  本报韩国釜山11月1日电  记者王嵘、朱笑熺当地时间11月1日晚国家主席习近平结束出席亚太经合组织第三十二次领导人非正式会议和对韩国的国事访问返回北京。</p>"
},
{
"tag": "p",
"content": "<p>  离开釜山时,韩国外长赵显等高级官员到机场送行。</p>"
},
{
"tag": "p",
"content": "<p>  前往机场途中,中国留学生和中资企业代表在道路两旁挥舞中韩两国国旗,热烈祝贺习近平主席访问圆满成功。 </p>"
},
{
"tag": "p",
"content": "<p></p>"
},
{
"tag": "p",
"content": "<p><span id=\"paper_num\">  《 人民日报 》( 2025年11月02日 01 版)</span></p>"
},
{
"tag": "img",
"content": "<img style='None' src='http://www.people.com.cn/img/2020wbc/imgs/share.png' />"
}
],
"url": "http://politics.people.com.cn/n1/2025/1102/c1024-40594763.html",
"publishTime": "2025年11月02日05:46",
"author": "",
"source": "人民网-人民日报",
"category": ""
},
{
"title": "",
"contentRows": [],
"url": "http://cpc.people.com.cn/n1/2025/1102/c64094-40594809.html",
"publishTime": "",
"author": "",
"source": "人民网",
"category": ""
},
{
"title": "《习近平的文化情缘》《习近平经济思想系列讲读》在澳门启播",
"contentRows": [
{
"tag": "p",
"content": "<p></p>"
},
{
"tag": "p",
"content": "<p style=\"text-indent: 2em;\">人民网澳门9月28日电 记者富子梅《习近平的文化情缘》及《习近平经济思想系列讲读》两部专题片在澳门启播仪式28日举行。澳门特区行政长官岑浩辉中宣部副部长、中央广播电视总台台长兼总编辑慎海雄中央政府驻澳门特区联络办公室主任郑新聪出席活动并致辞。</p>"
},
{
"tag": "img",
"content": "<img style='text-align: center;' src='http://www.people.com.cn/NMediaFile/2025/0928/MAIN1759049114282Z17GV1PI43.jpg' />"
},
{
"tag": "p",
"content": "<p style=\"text-align: center;\"><span desc=\"desc\">《习近平的文化情缘》《习近平经济思想系列讲读》澳门启播仪式。(澳门特区政府新闻局供图)</span></p>"
},
{
"tag": "p",
"content": "<p style=\"text-indent: 2em;\">岑浩辉表示,《习近平的文化情缘》《习近平经济思想系列讲读》在澳门落地启播,高度契合澳门中西荟萃、内联外通的优势和功能,具有重大而且深远的意义。期待以此为契机,持续深化推动广大澳门同胞和海内外人士对习近平新时代中国特色社会主义思想的关注、理解和实践,共同讲好中国故事、促进国际交流、不断扩大“朋友圈”</p>"
},
{
"tag": "p",
"content": "<p style=\"text-indent: 2em;\">慎海雄指出,两部精品节目是助力澳门各界更好学习领会领袖思想的一次生动实践,是让澳门居民深切感悟中华文明深厚底蕴和新时代伟大成就的一场文化盛宴。</p>"
},
{
"tag": "p",
"content": "<p style=\"text-indent: 2em;\">郑新聪表示,两部精品节目在澳门播出,有力促进习近平文化思想、习近平经济思想的宣传普及、落地生根,将为澳门打造中西文明交流互鉴的重要窗口、推动经济适度多元发展提供精神动力和科学指引。</p>"
},
{
"tag": "p",
"content": "<p style=\"text-indent: 2em;\">9月28日起电视专题片《习近平的文化情缘》在澳门广播电视股份有限公司的澳视澳门频道、澳门有线电视股份有限公司互动新闻台、澳门莲花卫视传媒有限公司网站以及《澳门日报》《大众报》《市民日报》《濠江日报》《正报》《澳门商报》《澳门焦点报》《莲花时报》等媒体的新媒体平台陆续上线。大型专题节目《习近平经济思想系列讲读》9月28日起在澳广视旗下电视频道及新媒体平台上线播出。</p>"
},
{
"tag": "p",
"content": "<p style=\"text-indent: 2em;\">启播仪式后举行的“盛世莲开颂华章 - 中央广播电视总台与澳门各界深化合作仪式”上双方代表分别交换《中央广播电视总台与澳门特别行政区政府深化战略合作框架协议》、《国家电影局与澳门特别行政区政府社会文化司关于电影产业合作框架协议》、《十五运会和残特奥会澳门赛区筹备办公室与中央广播电视总台合作意向书》、《中央广播电视总台与澳门广播电视股份有限公司关于整频道转播央视CCTV-5体育频道的协议》、《中央广播电视总台亚太总站与澳门大学深化战略合作框架协议》等5份合作文件。</p>"
},
{
"tag": "img",
"content": "<img style='None' src='http://www.people.com.cn/img/2020wbc/imgs/share.png' />"
}
],
"url": "http://gba.people.cn/n1/2025/0928/c42272-40573895.html",
"publishTime": "2025年09月28日16:44",
"author": "",
"source": "人民网-大湾区频道",
"category": ""
},
{
"title": "",
"contentRows": [],
"url": "http://cpc.people.com.cn/n1/2025/0926/c64094-40572435.html",
"publishTime": "",
"author": "",
"source": "人民网",
"category": ""
}
]
}

View File

@@ -66,7 +66,7 @@ class BaseCrawler(ABC):
self.session.headers.update(config.headers)
logger.info(f"初始化爬虫: {self.__class__.__name__}")
def fetch(self, url: str, method: str = "GET", data: Optional[Dict[str, Any]] = None, **kwargs) -> Optional[requests.Response]:
def fetch(self, url: str, method: str = "GET", data: Optional[Dict[str, Any]] = None, headers: Optional[Dict[str, str]] = None, **kwargs) -> Optional[requests.Response]:
"""
发送HTTP请求
@@ -74,6 +74,7 @@ class BaseCrawler(ABC):
url: 请求URL
method: 请求方法
data: 请求数据
headers: 额外的请求头,将与默认请求头合并(额外的优先)
**kwargs: 其他请求参数
Returns:
@@ -83,10 +84,19 @@ class BaseCrawler(ABC):
try:
logger.info(f"请求URL: {url} (尝试 {attempt + 1}/{self.config.retry_times})")
# 合并默认headers与调用方headers调用方覆盖默认
request_headers = dict(self.config.headers or {})
if headers:
request_headers.update(headers)
# 如果kwargs中意外包含headers合并后移除避免重复传参
extra_headers = kwargs.pop("headers", None)
if extra_headers:
request_headers.update(extra_headers)
response = self.session.request(
method=method,
url=url,
headers=self.config.headers,
headers=request_headers,
data=data,
timeout=self.config.timeout,
proxies={'http': self.config.proxy, 'https': self.config.proxy} if self.config.proxy else None,

View File

@@ -100,14 +100,25 @@ class RmrbCrawler(BaseCrawler):
search_data["page"] = page
response = self.fetch(search_config.url, method=search_config.method, json=search_data, headers=search_config.headers)
response_json = response.json()
if response_json.get("code") == 0:
if response_json.get("code") == '0':
records = response_json.get("data", {}).get("records", [])
for record in records:
news = self.parse_news_detail(record.get("url"))
if news['title'] == '':
news['title'] = record.get("title")
if news['contentRows'] == []:
news['contentRows'] = record.get("contentOriginal")
if news['publishTime'] == '':
news['publishTime'] = datetime.datetime.fromtimestamp(record.get("displayTime") / 1000).date()
if news['author'] == '':
news['author'] = record.get("author")
if news['source'] == '':
news['source'] = record.get("originName")
news_list.append(news)
else:
resultDomain.code = response_json.get("code")
resultDomain.message = "获取搜索结果失败" + response_json.get("message")
resultDomain.message = f"获取搜索结果失败{response_json.get('message') or ''}"
resultDomain.success = False
return resultDomain
page += 1
@@ -143,14 +154,14 @@ class RmrbCrawler(BaseCrawler):
response = self.fetch(hot_point_rank_config.url, method=hot_point_rank_config.method, headers=hot_point_rank_config.headers)
response_json = response.json()
if response_json.get("code") == 0:
if response_json.get("code") == '0':
records = response_json.get("data", [])
for record in records:
news = self.parse_news_detail(record.get("url"))
news_list.append(news)
else:
resultDomain.code = response_json.get("code")
resultDomain.message = "获取人民日报热点排行失败" + response_json.get("message")
resultDomain.message = f"获取人民日报热点排行失败{response_json.get('message') or ''}"
resultDomain.success = False
return resultDomain
resultDomain.success = True
@@ -160,7 +171,7 @@ class RmrbCrawler(BaseCrawler):
except Exception as e:
logger.error(f"获取人民日报热点排行失败: {str(e)}")
resultDomain.code = 0
resultDomain.message = "获取人民日报热点排行失败" + str(e)
resultDomain.message = f"获取人民日报热点排行失败{str(e)}"
resultDomain.success = False
return resultDomain
@@ -178,19 +189,19 @@ class RmrbCrawler(BaseCrawler):
date_str = date.strftime("%Y%m%d")
one_day_trending_news_config = self.config.urls.get("one_day_trending_news")
one_day_trending_news_config.url = one_day_trending_news_config.url.format(date_str)
one_day_trending_news_config.url = one_day_trending_news_config.url.format(date=date_str)
response = self.fetch(one_day_trending_news_config.url, method=one_day_trending_news_config.method, headers=one_day_trending_news_config.headers)
if not response:
logger.error(f"获取响应失败: {one_day_trending_news_config.url}")
resultDomain.code = 0
resultDomain.message = "获取响应失败" + one_day_trending_news_config.url
resultDomain.message = f"获取响应失败{one_day_trending_news_config.url or ''}"
resultDomain.success = False
return resultDomain
soup = self.parse_html(response.content)
if not soup:
logger.error(f"解析HTML失败: {one_day_trending_news_config.url}")
resultDomain.code = 0
resultDomain.message = "解析HTML失败" + one_day_trending_news_config.url
resultDomain.message = f"解析HTML失败{one_day_trending_news_config.url or ''}"
resultDomain.success = False
return resultDomain
@@ -215,7 +226,7 @@ class RmrbCrawler(BaseCrawler):
except Exception as e:
logger.error(f"获取人民日报一天内的热点新闻失败: {str(e)}")
resultDomain.code = 0
resultDomain.message = "获取人民日报一天内的热点新闻失败" + str(e)
resultDomain.message = f"获取人民日报一天内的热点新闻失败{str(e)}"
resultDomain.success = False
return resultDomain
@@ -243,7 +254,7 @@ class RmrbCrawler(BaseCrawler):
except Exception as e:
logger.error(f"获取人民日报多天内的热点新闻失败: {str(e)}")
resultDomain.code = 0
resultDomain.message = "获取人民日报多天内的热点新闻失败" + str(e)
resultDomain.message = f"获取人民日报多天内的热点新闻失败{str(e)}"
resultDomain.success = False
return resultDomain
@@ -259,29 +270,37 @@ class RmrbCrawler(BaseCrawler):
"""
try:
response = self.fetch(url)
news = NewsItem(
title="",
contentRows=[], # 修复:使用 contents 而不是 content
url=url,
publishTime="",
author="",
source="人民网",
category=""
)
if not response:
logger.error(f"获取响应失败: {url}")
return None
return news
# BeautifulSoup 可以自动检测并解码编码,直接传入字节数据即可
# 它会从 HTML 的 <meta charset> 标签或响应头自动检测编码
soup = self.parse_html(response.content)
if not soup:
logger.error("解析HTML失败")
return None
return news
# 提取主内容区域
main_div = soup.find("div", class_="layout rm_txt cf")
if not main_div:
logger.error("未找到主内容区域")
return None
return news
# 提取文章区域
article_div = main_div.find("div", class_="col col-1")
if not article_div:
logger.error("未找到文章区域")
return None
return news
# 提取标题
title_tag = article_div.select_one("h1")
@@ -347,15 +366,14 @@ class RmrbCrawler(BaseCrawler):
"content": content
})
news = NewsItem(
title=title,
contentRows=contents, # 修复:使用 contents 而不是 content
url=url,
publishTime=publish_time,
author=author,
source=source or "人民网",
category=""
)
news.title=title
news.contentRows=contents # 修复:使用 contents 而不是 content
news.url=url
news.publishTime=publish_time
news.author=author
news.source=source or "人民网"
news.category=""
logger.info(f"成功解析新闻: {title}")
return news

View File

@@ -25,20 +25,27 @@ def main():
epilog="""
示例:
python RmrbHotPoint.py
python RmrbHotPoint.py --output "output/hotpoint.json"
"""
)
# 添加输出文件参数
parser.add_argument(
'--output', '-o',
type=str,
help='输出文件路径'
)
args = parser.parse_args()
output_file = args.output
logger.info("使用直接参数模式")
try:
# 创建爬虫实例
logger.info("开始获取人民日报热点排行")
crawler = RmrbCrawler()
# 执行获取热点排行
result = crawler.hotPointRank()
# 输出JSON结果
output = {
"code": result.code,
"message": result.message,
@@ -47,12 +54,15 @@ def main():
"dataList": [item.dict() for item in result.dataList] if result.dataList else []
}
if output_file:
output_path = Path(output_file)
output_path.parent.mkdir(parents=True, exist_ok=True)
with open(output_path, 'w', encoding='utf-8') as f:
json.dump(output, f, ensure_ascii=False, indent=2)
logger.info(f"结果已保存到: {output_file}")
print(json.dumps(output, ensure_ascii=False, indent=2))
# 关闭爬虫
crawler.close()
# 退出码: 成功=0, 失败=1
sys.exit(0 if result.success else 1)
except Exception as e:
@@ -67,7 +77,6 @@ def main():
print(json.dumps(error_output, ensure_ascii=False, indent=2))
sys.exit(1)
" "
if __name__ == "__main__":
main()

View File

@@ -25,7 +25,8 @@ def main():
epilog="""
示例:
python RmrbSearch.py --key "教育改革" --total 20
python RmrbSearch.py -k "科技创新" -t 15 -n 1
python RmrbSearch.py -k "科技创新" -t 15 --type 1
python RmrbSearch.py --key "AI" --total 5 --output "out.json"
新闻类型说明:
0 - 所有类型 (默认)
@@ -38,53 +39,72 @@ def main():
)
parser.add_argument(
'--key', '-k',
'--query', '-q',
type=str,
required=True,
help='搜索关键词 (必需)'
help='搜索关键词'
)
parser.add_argument(
'--total', '-t',
type=int,
default=10,
help='获取新闻总数 (默认: 10)'
help='抓取数量 (默认: 10)'
)
parser.add_argument(
'--type', '-n',
type=int,
default=0,
choices=[0, 1, 2, 3, 4, 5],
help='新闻类型: 0=全部, 1=新闻, 2=互动, 3=报刊, 4=图片, 5=视频 (默认: 0)'
help='新闻类型 (默认: 0=所有类型)'
)
parser.add_argument(
'--output', '-o',
type=str,
help='输出文件路径'
)
args = parser.parse_args()
# 获取参数
key = args.query
total = args.total
news_type = args.type
output_file = args.output
logger.info("使用直接参数模式")
# 关键校验key 必须存在
if not key or not key.strip():
parser.error("搜索关键词不能为空!")
try:
# 创建爬虫实例
logger.info(f"开始搜索: 关键词='{args.key}', 数量={args.total}, 类型={args.type}")
logger.info(f"开始搜索: 关键词='{key}', 数量={total}, 类型={news_type}")
crawler = RmrbCrawler()
# result = crawler.search(key=key.strip(), total=total, news_type=news_type)
result = None
with open("../output/output.json", "r", encoding="utf-8") as f:
result = json.load(f)
# 执行搜索
result = crawler.search(key=args.key, total=args.total, news_type=args.type)
output = result
# output = {
# "code": result["code"],
# "message": result["message"],
# "success": result["success"],
# "data": None,
# "dataList": [item.model_dump() for item in result["dataList"]] if result["dataList"] else []
# }
# 输出JSON结果
output = {
"code": result.code,
"message": result.message,
"success": result.success,
"data": None,
"dataList": [item.dict() for item in result.dataList] if result.dataList else []
}
if output_file:
output_path = Path(output_file)
output_path.parent.mkdir(parents=True, exist_ok=True)
with open(output_path, 'w', encoding='utf-8') as f:
json.dump(output, f, ensure_ascii=False, indent=2)
logger.info(f"结果已保存到: {output_file}")
print(json.dumps(output, ensure_ascii=False, indent=2))
# 关闭爬虫
crawler.close()
# 退出码: 成功=0, 失败=1
sys.exit(0 if result.success else 1)
sys.exit(0 if result["success"] else 1)
except Exception as e:
logger.error(f"执行失败: {str(e)}")

View File

@@ -10,7 +10,7 @@
import argparse
import json
import sys
from datetime import datetime
from datetime import datetime, timedelta
from pathlib import Path
# Add parent directory to path to import crawler
@@ -20,20 +20,29 @@ from crawler.RmrbCrawler import RmrbCrawler
from loguru import logger
def parse_date(date_str: str) -> datetime:
def parse_date(date_str) -> datetime:
"""
解析日期字符串为datetime对象
解析日期字符串或数字为datetime对象 (格式: YYYYMMDD)
Args:
date_str: 日期字符串格式为YYYYMMDD
date_str: 可为字符串或整数,如 "20250110" 或 20250110
Returns:
datetime对象
Raises:
ValueError: 格式错误
"""
# 统一转为字符串并清理
if date_str is None:
raise ValueError("日期不能为空")
date_str = str(date_str).strip()
if len(date_str) != 8 or not date_str.isdigit():
raise ValueError(f"日期格式错误: '{date_str}'正确格式为YYYYMMDD例如: '20250110'")
try:
return datetime.strptime(date_str, "%Y%m%d")
except ValueError:
raise ValueError(f"日期格式错误: {date_str}正确格式为YYYYMMDD例如: 20250110")
raise ValueError(f"日期格式错误: '{date_str}'正确格式为YYYYMMDD例如: '20250110'")
def main():
@@ -51,68 +60,73 @@ def main():
python RmrbTrending.py --start-date 20250101 --end-date 20250110
python RmrbTrending.py -s 20250101 -e 20250110
# 不指定日期则获取今天的热点新闻
# 不指定日期则根据 isYesterday 决定(默认昨日)
python RmrbTrending.py
"""
)
parser.add_argument(
'--date', '-d',
type=str,
help='指定日期 (格式: YYYYMMDD例如: 20250110)'
)
parser.add_argument(
'--start-date', '-s',
type=str,
help='开始日期 (格式: YYYYMMDD需与--end-date一起使用)'
)
parser.add_argument(
'--end-date', '-e',
type=str,
help='结束日期 (格式: YYYYMMDD需与--start-date一起使用)'
)
parser.add_argument('--date', '-d', type=str, help='指定日期 (格式: YYYYMMDD)')
parser.add_argument('--startDate', '-s', type=str, help='开始日期 (需与--end-date一起使用)')
parser.add_argument('--endDate', '-e', type=str, help='结束日期 (需与--start-date一起使用)')
parser.add_argument('--yesterday', '-y', action='store_true', help='查询昨日 (默认行为)')
parser.add_argument('--output', '-o', type=str, help='输出文件路径')
args = parser.parse_args()
# 初始化变量
output_file = args.output
date = args.date
start_date = args.startDate
end_date = args.endDate
is_yesterday = args.yesterday if args.yesterday else True # 默认查昨日
logger.info("使用直接参数模式")
# 辅助函数:清理空字符串
def clean(s):
return s.strip() if s and isinstance(s, str) and s.strip() else None
date = clean(date)
start_date = clean(start_date)
end_date = clean(end_date)
try:
# 创建爬虫实例
crawler = RmrbCrawler()
# 判断使用哪种模式
if args.date:
# 单日模式
if args.start_date or args.end_date:
raise ValueError("不能同时使用--date和--start-date/--end-date参数")
target_date = parse_date(args.date)
logger.info(f"获取单日热点新闻: {args.date}")
if date:
if start_date or end_date:
raise ValueError("不能同时使用 date 和 startDate/endDate 参数")
target_date = parse_date(date)
logger.info(f"获取单日热点新闻: {target_date.strftime('%Y-%m-%d')}")
result = crawler.getOneDayTrendingNews(target_date)
elif args.start_date and args.end_date:
# 日期范围模式
start_date = parse_date(args.start_date)
end_date = parse_date(args.end_date)
if start_date > end_date:
elif start_date and end_date:
if date:
raise ValueError("不能同时使用 date 和 startDate/endDate 参数")
start_dt = parse_date(start_date)
end_dt = parse_date(end_date)
if start_dt > end_dt:
raise ValueError("开始日期不能晚于结束日期")
logger.info(f"获取日期范围热点新闻: {start_dt.strftime('%Y-%m-%d')}{end_dt.strftime('%Y-%m-%d')}")
result = crawler.getDaysTrendingNews(start_dt, end_dt)
logger.info(f"获取日期范围热点新闻: {args.start_date}{args.end_date}")
result = crawler.getDaysTrendingNews(start_date, end_date)
elif args.start_date or args.end_date:
# 只指定了一个日期
raise ValueError("--start-date和--end-date必须同时使用")
# 只给一个边界
elif start_date or end_date:
raise ValueError("--start-date 和 --end-date 必须同时指定")
# 默认模式
else:
# 默认使用今天的日期
today = datetime.now()
today_str = today.strftime("%Y%m%d")
logger.info(f"获取今日热点新闻: {today_str}")
result = crawler.getOneDayTrendingNews(today)
if is_yesterday:
target_date = datetime.now() - timedelta(days=1)
logger.info(f"获取昨日热点新闻: {target_date.strftime('%Y-%m-%d')}")
else:
target_date = datetime.now()
logger.info(f"获取今日热点新闻: {target_date.strftime('%Y-%m-%d')}")
result = crawler.getOneDayTrendingNews(target_date)
# 输出JSON结果
# 构造输出
output = {
"code": result.code,
"message": result.message,
@@ -121,12 +135,16 @@ def main():
"dataList": [item.dict() for item in result.dataList] if result.dataList else []
}
# 保存到文件
if output_file:
output_path = Path(output_file)
output_path.parent.mkdir(parents=True, exist_ok=True)
with open(output_path, 'w', encoding='utf-8') as f:
json.dump(output, f, ensure_ascii=False, indent=2)
logger.info(f"结果已保存到: {output_file}")
print(json.dumps(output, ensure_ascii=False, indent=2))
# 关闭爬虫
crawler.close()
# 退出码: 成功=0, 失败=1
sys.exit(0 if result.success else 1)
except ValueError as e:

0
schoolNewsCrawler/lxml Normal file
View File

View File

@@ -5,7 +5,9 @@
import sys
import json
import argparse
from typing import List
from pathlib import Path
from loguru import logger
from crawler.RmrbCrawler import RmrbCrawler
from crawler.BaseCrawler import NewsItem
@@ -83,17 +85,62 @@ def save_to_json(news_list: List[dict], output_file: str = "output/news.json"):
def main():
"""主函数"""
# 解析命令行参数
category = "politics"
limit = 20
output_file = "output/news.json"
# 创建参数解析器
parser = argparse.ArgumentParser(
description='人民日报新闻爬虫主程序',
formatter_class=argparse.RawDescriptionHelpFormatter
)
if len(sys.argv) > 1:
category = sys.argv[1]
if len(sys.argv) > 2:
limit = int(sys.argv[2])
if len(sys.argv) > 3:
output_file = sys.argv[3]
# 添加位置参数(保持向后兼容)
parser.add_argument(
'category',
nargs='?',
default='politics',
help='新闻分类 (默认: politics)'
)
parser.add_argument(
'limit',
nargs='?',
type=int,
default=20,
help='爬取数量 (默认: 20)'
)
parser.add_argument(
'output_file',
nargs='?',
default='output/news.json',
help='输出文件路径 (默认: output/news.json)'
)
# 添加JSON参数支持
parser.add_argument(
'--json', '-j',
type=str,
help='JSON格式参数 (优先级高于其他参数)'
)
args = parser.parse_args()
# 解析参数: JSON参数优先
if args.json:
try:
json_data = json.loads(args.json)
params = json_data.get('params', {})
category = params.get('category', 'politics')
limit = params.get('limit', 20)
output_file = json_data.get('outputFile', 'output/news.json')
logger.info("使用JSON参数模式")
except Exception as e:
logger.error(f"JSON参数解析失败: {e}")
sys.exit(1)
else:
# 使用命令行参数
category = args.category
limit = args.limit
output_file = args.output_file
logger.info("使用命令行参数模式")
logger.info("=" * 60)
logger.info("新闻爬虫程序启动")

View File

@@ -66,7 +66,7 @@ CREATE TABLE `tb_data_collection_item` (
`id` VARCHAR(64) NOT NULL COMMENT '主键ID',
`task_id` VARCHAR(64) NOT NULL COMMENT '关联任务ID',
`log_id` VARCHAR(64) NOT NULL COMMENT '关联执行日志ID',
`title` VARCHAR(255) NOT NULL COMMENT '文章标题',
`title` VARCHAR(255) DEFAULT NULL COMMENT '文章标题',
`content` LONGTEXT DEFAULT NULL COMMENT '文章内容HTML',
`summary` VARCHAR(500) DEFAULT NULL COMMENT '文章摘要',
`source` VARCHAR(255) DEFAULT NULL COMMENT '来源(如 人民日报)',

View File

@@ -114,35 +114,49 @@ school-news:
crawler:
python:
path: F:\Environment\Conda\envs\shoolNewsCrewer
base:
path: F:/Project/schoolNews/schoolNewsCrawler
# Python 可执行文件路径Windows 建议指向 python.exe如已在 PATH可直接用 "python"
pythonPath: F:/Environment/Conda/envs/schoolNewsCrawler/python.exe
# 爬虫脚本根目录NewsCrawlerTask 的工作目录)
basePath: F:/Project/schoolNews/schoolNewsCrawler
crontab:
items: #可供前端选择的定时任务列表
- name: 人民日报新闻爬取
methods: #爬取方式
- name: 关键字搜索爬取
class: org.xyzh.crontab.task.newsTask.NewsCrawlerTask
clazz: newsCrewerTask
excuete_method: execute
path: crawler/RmrbSearch.py
params:
query: String #搜索关键字
total: Integer #总新闻数量
- name: query
description: 搜索关键字
type: String
value: ""
- name: total
description: 总新闻数量
type: Integer
value: 10
- name: 排行榜爬取
class: org.xyzh.crontab.task.newsTask.NewsCrawlerTask
clazz: newsCrewerTask
excuete_method: execute
path: crawler/RmrbHotPoint.py
- name: 往日精彩头条爬取
class: org.xyzh.crontab.task.newsTask.NewsCrawlerTask
clazz: newsCrewerTask
excuete_method: execute
path: crawler/RmrbTrending.py
params:
startDate: String #开始日期
endDate: String #结束日期
isYestoday: Boolean #是否是昨天
- name: startDate
description: 开始日期
type: String
value: ""
- name: endDate
description: 结束日期
type: String
value: ""
- name: yesterday
description: 是否是昨天
type: Boolean
value: true
# 文件存储配置
file:

View File

@@ -111,6 +111,9 @@
<Logger name="org.xyzh.news.mapper" level="debug" additivity="false">
<AppenderRef ref="Console"/>
</Logger>
<Logger name="org.xyzh.crontab.mapper" level="debug" additivity="false">
<AppenderRef ref="Console"/>
</Logger>
<!-- 项目包日志配置 - Auth模块 -->
<Logger name="org.xyzh.auth" level="debug" additivity="false">
@@ -162,6 +165,15 @@
<AppenderRef ref="DatabaseAppender"/>
</Logger>
<Logger name="org.xyzh.crontab" level="debug" additivity="false">
<AppenderRef ref="Console"/>
<AppenderRef ref="Filelog"/>
<AppenderRef ref="RollingFileInfo"/>
<AppenderRef ref="RollingFileWarn"/>
<AppenderRef ref="RollingFileError"/>
<AppenderRef ref="DatabaseAppender"/>
</Logger>
<root level="info">
<appender-ref ref="Console"/>
<appender-ref ref="Filelog"/>

View File

@@ -1,5 +1,7 @@
package org.xyzh.api.crontab;
import java.util.List;
import org.xyzh.common.core.domain.ResultDomain;
import org.xyzh.common.core.page.PageParam;
import org.xyzh.common.dto.crontab.TbDataCollectionItem;
@@ -30,7 +32,7 @@ public interface DataCollectionItemService {
* @author yslg
* @since 2025-11-08
*/
ResultDomain<Integer> batchCreateItems(java.util.List<TbDataCollectionItem> itemList);
ResultDomain<Integer> batchCreateItems(List<TbDataCollectionItem> itemList);
/**
* @description 更新采集项

View File

@@ -1,12 +1,10 @@
package org.xyzh.common.vo;
import org.xyzh.common.dto.crontab.TbDataCollectionItem;
import org.xyzh.common.dto.crontab.TbCrontabTask;
import java.io.Serializable;
import java.util.Date;
/**
* @description 数据采集项VO
* @description 数据采集项VO (平铺结构,包含关联的任务和日志信息)
* @filename DataCollectionItemVO.java
* @author yslg
* @copyright xyzh
@@ -16,53 +14,414 @@ public class DataCollectionItemVO implements Serializable {
private static final long serialVersionUID = 1L;
/**
* @description 采集项数据
*/
private TbDataCollectionItem item;
// ==================== 采集项基本信息 ====================
/**
* @description 关联的定时任务信息
* 采集项ID
*/
private TbCrontabTask task;
private String id;
/**
* @description 状态文本(用于前端显示)
* 任务ID
*/
private String statusText;
private String taskId;
/**
* @description 是否可以编辑(未处理和已忽略的可以编辑)
* 日志ID
*/
private String logId;
/**
* 文章标题
*/
private String title;
/**
* 文章内容(HTML)
*/
private String content;
/**
* 文章摘要
*/
private String summary;
/**
* 来源
*/
private String source;
/**
* 来源URL
*/
private String sourceUrl;
/**
* 分类
*/
private String category;
/**
* 作者
*/
private String author;
/**
* 发布时间
*/
private Date publishTime;
/**
* 封面图片URL
*/
private String coverImage;
/**
* 图片列表(JSON)
*/
private String images;
/**
* 标签
*/
private String tags;
/**
* 状态(0:未处理 1:已转换为资源 2:已忽略)
*/
private Integer status;
/**
* 转换后的资源ID
*/
private String resourceId;
/**
* 爬取时间
*/
private Date crawlTime;
/**
* 处理时间
*/
private Date processTime;
/**
* 处理人
*/
private String processor;
/**
* 创建时间
*/
private Date createTime;
/**
* 更新时间
*/
private Date updateTime;
// ==================== 关联的任务信息 ====================
/**
* 任务名称
*/
private String taskName;
/**
* 任务分组
*/
private String taskGroup;
/**
* Bean名称
*/
private String beanName;
/**
* 方法名称
*/
private String methodName;
/**
* 方法参数
*/
private String methodParams;
// ==================== 关联的日志信息 ====================
/**
* 执行状态(0:失败 1:成功)
*/
private Integer executeStatus;
/**
* 执行时长(ms)
*/
private Long executeDuration;
/**
* 开始时间
*/
private Date startTime;
/**
* 结束时间
*/
private Date endTime;
// ==================== 扩展字段 ====================
/**
* 是否可以编辑(未处理和已忽略的可以编辑)
*/
private Boolean canEdit;
/**
* @description 是否可以转换为资源未处理的可以转换
* 是否可以转换为资源(未处理的可以转换)
*/
private Boolean canConvert;
public TbDataCollectionItem getItem() {
return item;
// ==================== Getter/Setter ====================
public String getId() {
return id;
}
public void setItem(TbDataCollectionItem item) {
this.item = item;
public void setId(String id) {
this.id = id;
}
public TbCrontabTask getTask() {
return task;
public String getTaskId() {
return taskId;
}
public void setTask(TbCrontabTask task) {
this.task = task;
public void setTaskId(String taskId) {
this.taskId = taskId;
}
public String getStatusText() {
return statusText;
public String getLogId() {
return logId;
}
public void setStatusText(String statusText) {
this.statusText = statusText;
public void setLogId(String logId) {
this.logId = logId;
}
public String getTitle() {
return title;
}
public void setTitle(String title) {
this.title = title;
}
public String getContent() {
return content;
}
public void setContent(String content) {
this.content = content;
}
public String getSummary() {
return summary;
}
public void setSummary(String summary) {
this.summary = summary;
}
public String getSource() {
return source;
}
public void setSource(String source) {
this.source = source;
}
public String getSourceUrl() {
return sourceUrl;
}
public void setSourceUrl(String sourceUrl) {
this.sourceUrl = sourceUrl;
}
public String getCategory() {
return category;
}
public void setCategory(String category) {
this.category = category;
}
public String getAuthor() {
return author;
}
public void setAuthor(String author) {
this.author = author;
}
public Date getPublishTime() {
return publishTime;
}
public void setPublishTime(Date publishTime) {
this.publishTime = publishTime;
}
public String getCoverImage() {
return coverImage;
}
public void setCoverImage(String coverImage) {
this.coverImage = coverImage;
}
public String getImages() {
return images;
}
public void setImages(String images) {
this.images = images;
}
public String getTags() {
return tags;
}
public void setTags(String tags) {
this.tags = tags;
}
public Integer getStatus() {
return status;
}
public void setStatus(Integer status) {
this.status = status;
}
public String getResourceId() {
return resourceId;
}
public void setResourceId(String resourceId) {
this.resourceId = resourceId;
}
public Date getCrawlTime() {
return crawlTime;
}
public void setCrawlTime(Date crawlTime) {
this.crawlTime = crawlTime;
}
public Date getProcessTime() {
return processTime;
}
public void setProcessTime(Date processTime) {
this.processTime = processTime;
}
public String getProcessor() {
return processor;
}
public void setProcessor(String processor) {
this.processor = processor;
}
public Date getCreateTime() {
return createTime;
}
public void setCreateTime(Date createTime) {
this.createTime = createTime;
}
public Date getUpdateTime() {
return updateTime;
}
public void setUpdateTime(Date updateTime) {
this.updateTime = updateTime;
}
public String getTaskName() {
return taskName;
}
public void setTaskName(String taskName) {
this.taskName = taskName;
}
public String getTaskGroup() {
return taskGroup;
}
public void setTaskGroup(String taskGroup) {
this.taskGroup = taskGroup;
}
public String getBeanName() {
return beanName;
}
public void setBeanName(String beanName) {
this.beanName = beanName;
}
public String getMethodName() {
return methodName;
}
public void setMethodName(String methodName) {
this.methodName = methodName;
}
public String getMethodParams() {
return methodParams;
}
public void setMethodParams(String methodParams) {
this.methodParams = methodParams;
}
public Integer getExecuteStatus() {
return executeStatus;
}
public void setExecuteStatus(Integer executeStatus) {
this.executeStatus = executeStatus;
}
public Long getExecuteDuration() {
return executeDuration;
}
public void setExecuteDuration(Long executeDuration) {
this.executeDuration = executeDuration;
}
public Date getStartTime() {
return startTime;
}
public void setStartTime(Date startTime) {
this.startTime = startTime;
}
public Date getEndTime() {
return endTime;
}
public void setEndTime(Date endTime) {
this.endTime = endTime;
}
public Boolean getCanEdit() {

View File

@@ -1,5 +1,6 @@
package org.xyzh.crontab.config;
import org.springframework.beans.factory.annotation.Value;
import org.springframework.boot.context.properties.ConfigurationProperties;
import lombok.Data;
import org.springframework.stereotype.Component;
@@ -9,8 +10,10 @@ import org.springframework.stereotype.Component;
@Component
public class CrawlerProperties {
@Value("${crawler.pythonPath}")
private String pythonPath;
@Value("${crawler.basePath}")
private String basePath;
}

View File

@@ -12,6 +12,10 @@ import org.xyzh.common.dto.crontab.TbCrontabTask;
import org.xyzh.common.dto.crontab.TbCrontabLog;
import org.xyzh.common.utils.IDUtils;
import org.xyzh.crontab.pojo.CrontabItem;
import com.alibaba.fastjson2.JSON;
import com.alibaba.fastjson2.JSONObject;
import org.xyzh.common.utils.spring.SpringContextUtil;
import org.xyzh.crontab.config.CrontabProperties;
@@ -47,6 +51,14 @@ public class CrontabController {
// 仅返回爬虫能力的元信息(任务模版列表),不包含调度相关内容
CrontabProperties props =
SpringContextUtil.getBean(CrontabProperties.class);
String jString = JSON.toJSONString(props);
props = JSON.parseObject(jString, CrontabProperties.class);
props.getItems().forEach(item->item.getMethods().forEach(
method->{
method.setClazz(null);
method.setExcuete_method(null);
method.setPath(null);
}));
rd.success("ok", props.getItems());
} catch (Exception e) {
rd.fail("获取可创建定时任务失败: " + e.getMessage());
@@ -63,6 +75,25 @@ public class CrontabController {
public ResultDomain<TbCrontabTask> createCrontab(@RequestBody TbCrontabTask crontabItem) {
ResultDomain<TbCrontabTask> rd = new ResultDomain<>();
try {
// 根据taskGroup和methodName查找配置并填充beanName和methodName
if (crontabItem.getBeanName() == null || crontabItem.getBeanName().isEmpty()) {
CrontabItem.CrontabMethod method = findMethodByTaskGroupAndMethodName(
crontabItem.getTaskGroup(),
crontabItem.getMethodName()
);
if (method != null) {
crontabItem.setBeanName(method.getClazz()); // 设置Bean名称
crontabItem.setMethodName(method.getExcuete_method()); // 设置执行方法名
JSONObject methodParams = JSON.parseObject(crontabItem.getMethodParams());
methodParams.put("scriptPath", method.getPath());
crontabItem.setMethodParams(methodParams.toJSONString());
} else {
rd.fail("未找到对应的配置: taskGroup=" + crontabItem.getTaskGroup()
+ ", methodName=" + crontabItem.getMethodName());
return rd;
}
}
return crontabService.createTask(crontabItem);
} catch (Exception e) {
logger.error("创建定时任务失败", e);
@@ -71,6 +102,27 @@ public class CrontabController {
}
}
/**
* 根据taskGroup和methodName查找对应的方法配置
*/
private CrontabItem.CrontabMethod findMethodByTaskGroupAndMethodName(String taskGroup, String methodName) {
CrontabProperties props = SpringContextUtil.getBean(CrontabProperties.class);
if (props == null || props.getItems() == null) {
return null;
}
for (CrontabItem item : props.getItems()) {
if (item.getName().equals(taskGroup)) {
for (CrontabItem.CrontabMethod method : item.getMethods()) {
if (method.getName().equals(methodName)) {
return method;
}
}
}
}
return null;
}
/**
* 更新定时任务
* @param crontabItem
@@ -80,6 +132,21 @@ public class CrontabController {
public ResultDomain<TbCrontabTask> updateCrontab(@RequestBody TbCrontabTask crontabItem) {
ResultDomain<TbCrontabTask> rd = new ResultDomain<>();
try {
// 根据taskGroup和methodName查找配置并填充beanName和methodName
if (crontabItem.getBeanName() == null || crontabItem.getBeanName().isEmpty()) {
CrontabItem.CrontabMethod method = findMethodByTaskGroupAndMethodName(
crontabItem.getTaskGroup(),
crontabItem.getMethodName()
);
if (method != null) {
crontabItem.setBeanName(method.getClazz()); // 设置Bean名称
crontabItem.setMethodName(method.getExcuete_method()); // 设置执行方法名
} else {
rd.fail("未找到对应的配置: taskGroup=" + crontabItem.getTaskGroup()
+ ", methodName=" + crontabItem.getMethodName());
return rd;
}
}
return crontabService.updateTask(crontabItem);
} catch (Exception e) {
logger.error("更新定时任务失败", e);
@@ -147,5 +214,87 @@ public class CrontabController {
}
}
/**
* 根据ID查询日志详情
* @param logId 日志ID
* @return ResultDomain<TbCrontabLog>
*/
@GetMapping("/log/{logId}")
public ResultDomain<TbCrontabLog> getLogById(@PathVariable(required = true, name="logId") String logId) {
ResultDomain<TbCrontabLog> rd = new ResultDomain<>();
try {
return crontabService.getLogById(logId);
} catch (Exception e) {
logger.error("获取日志详情失败", e);
rd.fail("获取日志详情失败: " + e.getMessage());
return rd;
}
}
@GetMapping("/task/validate")
public ResultDomain<String> validateCronExpression(@RequestParam(required = true, name="cronExpression") String cronExpression) {
ResultDomain<String> rd = new ResultDomain<>();
try {
return crontabService.validateCronExpression(cronExpression);
} catch (Exception e) {
logger.error("验证Cron表达式失败", e);
rd.fail("验证Cron表达式失败: " + e.getMessage());
return rd;
}
}
/**
* @description 启动定时任务
* @param
* @author yslg
* @since 2025-11-11
*/
@PostMapping("/task/start/{taskId}")
public ResultDomain<TbCrontabTask> startTask(@PathVariable(required = true, name="taskId") String taskId) {
ResultDomain<TbCrontabTask> rd = new ResultDomain<>();
try {
return crontabService.startTask(taskId);
} catch (Exception e) {
logger.error("启动定时任务失败", e);
rd.fail("启动定时任务失败: " + e.getMessage());
return rd;
}
}
/**
* @description 暂停定时任务
* @param
* @author yslg
* @since 2025-11-11
*/
@PostMapping("/task/pause/{taskId}")
public ResultDomain<TbCrontabTask> pauseTask(@PathVariable(required = true, name="taskId") String taskId) {
ResultDomain<TbCrontabTask> rd = new ResultDomain<>();
try {
return crontabService.pauseTask(taskId);
} catch (Exception e) {
logger.error("暂停定时任务失败", e);
rd.fail("暂停定时任务失败: " + e.getMessage());
return rd;
}
}
/**
* @description 立即执行一次任务
* @param
* @author yslg
* @since 2025-11-11
*/
@PostMapping("/task/execute/{taskId}")
public ResultDomain<TbCrontabTask> executeTaskOnce(@PathVariable(required = true, name="taskId") String taskId) {
ResultDomain<TbCrontabTask> rd = new ResultDomain<>();
try {
return crontabService.executeTaskOnce(taskId);
} catch (Exception e) {
logger.error("执行定时任务失败", e);
rd.fail("执行定时任务失败: " + e.getMessage());
return rd;
}
}
}

View File

@@ -5,6 +5,7 @@ import org.apache.ibatis.annotations.Mapper;
import org.apache.ibatis.annotations.Param;
import org.xyzh.common.core.page.PageParam;
import org.xyzh.common.dto.crontab.TbDataCollectionItem;
import org.xyzh.common.vo.DataCollectionItemVO;
import java.util.List;
@@ -82,5 +83,45 @@ public interface DataCollectionItemMapper extends BaseMapper<TbDataCollectionIte
* @since 2025-11-08
*/
long countByStatus(@Param("taskId") String taskId, @Param("status") Integer status);
// ==================== VO查询方法(使用JOIN返回完整VO) ====================
/**
* @description 根据ID查询采集项VO包含关联的任务和日志信息
* @param itemId 采集项ID
* @return DataCollectionItemVO 采集项VO
* @author yslg
* @since 2025-11-08
*/
DataCollectionItemVO selectVOById(@Param("itemId") String itemId);
/**
* @description 查询采集项VO列表包含关联的任务和日志信息
* @param filter 过滤条件
* @return List<DataCollectionItemVO> 采集项VO列表
* @author yslg
* @since 2025-11-08
*/
List<DataCollectionItemVO> selectVOList(TbDataCollectionItem filter);
/**
* @description 分页查询采集项VO列表包含关联的任务和日志信息
* @param filter 过滤条件
* @param pageParam 分页参数
* @return List<DataCollectionItemVO> 采集项VO列表
* @author yslg
* @since 2025-11-08
*/
List<DataCollectionItemVO> selectVOPage(@Param("filter") TbDataCollectionItem filter, @Param("pageParam") PageParam pageParam);
/**
* @description 根据任务ID查询采集项VO列表包含关联的任务和日志信息
* @param taskId 任务ID
* @return List<DataCollectionItemVO> 采集项VO列表
* @author yslg
* @since 2025-11-08
*/
List<DataCollectionItemVO> selectVOByTaskId(@Param("taskId") String taskId);
}

View File

@@ -16,9 +16,17 @@ public class CrontabItem {
@Data
public static class CrontabMethod {
private String name;
@JSONField(name = "class")
private String clazz;
private String excuete_method;
private String path;
private Map<String, Object> params;
private List<CrontabParam> params;
}
@Data
public static class CrontabParam {
private String name;
private String description;
private String type;
private Object value;
}
}

View File

@@ -11,9 +11,13 @@ import org.xyzh.common.utils.IDUtils;
import org.xyzh.crontab.mapper.CrontabLogMapper;
import org.xyzh.crontab.pojo.TaskParams;
import com.alibaba.fastjson2.JSON;
import com.alibaba.fastjson2.TypeReference;
import java.lang.reflect.Method;
import java.util.Date;
import java.util.HashMap;
import java.util.Map;
/**
* @description 任务执行器
@@ -138,22 +142,26 @@ public class TaskExecutor {
private String injectTaskContext(Object bean, TbCrontabTask task, TbCrontabLog log) {
String methodParams = task.getMethodParams();
// 如果Bean是BaseTask的子类注入taskId和logId到JSON参数中
if (bean instanceof org.xyzh.crontab.task.BaseTask) {
try {
TaskParams taskParams = TaskParams.fromJson(methodParams);
if (taskParams != null) {
// 从task对象构建完整的TaskParams
TaskParams taskParams = new TaskParams();
taskParams.setTaskGroup(task.getTaskGroup()); // 从task表获取
taskParams.setMethodName(task.getMethodName()); // 从task表获取
// 将methodParams解析为Map并设置到params字段
Map<String, Object> params = JSON.parseObject(methodParams,
new TypeReference<Map<String, Object>>(){});
// 注入taskId和logId
if (taskParams.getParams() == null) {
taskParams.setParams(new HashMap<>());
}
taskParams.getParams().put("taskId", task.getTaskId());
taskParams.getParams().put("logId", log.getID());
params.put("taskId", task.getTaskId());
params.put("logId", log.getID());
taskParams.setParams(params);
methodParams = taskParams.toJson();
logger.debug("已注入任务上下文: taskId={}, logId={}", task.getTaskId(), log.getID());
}
} catch (Exception e) {
logger.warn("注入任务上下文失败,使用原始参数: {}", e.getMessage());
logger.warn("构建TaskParams失败: {}", e.getMessage());
}
}

View File

@@ -23,7 +23,6 @@ import org.xyzh.system.utils.LoginUtil;
import java.util.Date;
import java.util.List;
import java.util.stream.Collectors;
/**
* @description 数据采集项服务实现类
@@ -102,29 +101,9 @@ public class DataCollectionItemServiceImpl implements DataCollectionItemService
int successCount = 0;
Date now = new Date();
for (TbDataCollectionItem item : itemList) {
// 检查URL是否已存在去重
if (item.getSourceUrl() != null && !item.getSourceUrl().isEmpty()) {
TbDataCollectionItem existing = itemMapper.selectBySourceUrl(item.getSourceUrl());
if (existing != null) {
logger.debug("跳过已存在的采集项: {}", item.getSourceUrl());
continue;
}
}
// 设置默认值
item.setID(IDUtils.generateID());
item.setCreateTime(now);
item.setDeleted(false);
if (item.getStatus() == null) {
item.setStatus(0);
}
if (item.getCrawlTime() == null) {
item.setCrawlTime(now);
}
itemMapper.insert(item);
successCount++;
int result = itemMapper.batchInsertItems(itemList);
if (result > 0) {
successCount = result;
}
logger.info("批量创建采集项成功,共{}条,成功{}条", itemList.size(), successCount);
@@ -195,9 +174,8 @@ public class DataCollectionItemServiceImpl implements DataCollectionItemService
return resultDomain;
}
TbDataCollectionItem item = itemMapper.selectById(itemId);
if (item != null) {
DataCollectionItemVO vo = buildVO(item);
DataCollectionItemVO vo = itemMapper.selectVOById(itemId);
if (vo != null) {
resultDomain.success("查询成功", vo);
} else {
resultDomain.fail("采集项不存在");
@@ -218,10 +196,8 @@ public class DataCollectionItemServiceImpl implements DataCollectionItemService
}
filter.setDeleted(false);
List<TbDataCollectionItem> list = itemMapper.selectItemList(filter);
List<DataCollectionItemVO> voList = list.stream()
.map(this::buildVO)
.collect(Collectors.toList());
List<DataCollectionItemVO> voList = itemMapper.selectVOList(filter);
resultDomain.success("查询成功", voList);
} catch (Exception e) {
@@ -244,12 +220,9 @@ public class DataCollectionItemServiceImpl implements DataCollectionItemService
pageParam = new PageParam();
}
List<TbDataCollectionItem> list = itemMapper.selectItemPage(filter, pageParam);
long total = itemMapper.countItems(filter);
List<DataCollectionItemVO> voList = itemMapper.selectVOPage(filter, pageParam);
List<DataCollectionItemVO> voList = list.stream()
.map(this::buildVO)
.collect(Collectors.toList());
long total = itemMapper.countItems(filter);
PageDomain<DataCollectionItemVO> pageDomain = new PageDomain<>();
pageDomain.setDataList(voList);
@@ -274,10 +247,8 @@ public class DataCollectionItemServiceImpl implements DataCollectionItemService
return resultDomain;
}
List<TbDataCollectionItem> list = itemMapper.selectByTaskId(taskId);
List<DataCollectionItemVO> voList = list.stream()
.map(this::buildVO)
.collect(Collectors.toList());
List<DataCollectionItemVO> voList = itemMapper.selectVOByTaskId(taskId);
resultDomain.success("查询成功", voList);
} catch (Exception e) {
@@ -433,47 +404,5 @@ public class DataCollectionItemServiceImpl implements DataCollectionItemService
return resultDomain;
}
/**
* @description 构建VO对象
* @param item 采集项
* @return DataCollectionItemVO
* @author yslg
* @since 2025-11-08
*/
private DataCollectionItemVO buildVO(TbDataCollectionItem item) {
DataCollectionItemVO vo = new DataCollectionItemVO();
vo.setItem(item);
// 查询关联的定时任务
if (item.getTaskId() != null && !item.getTaskId().isEmpty()) {
TbCrontabTask task = taskMapper.selectTaskById(item.getTaskId());
vo.setTask(task);
}
// 设置状态文本
String statusText = "未处理";
if (item.getStatus() != null) {
switch (item.getStatus()) {
case 0:
statusText = "未处理";
break;
case 1:
statusText = "已转换为资源";
break;
case 2:
statusText = "已忽略";
break;
default:
statusText = "未知";
}
}
vo.setStatusText(statusText);
// 设置操作权限
vo.setCanEdit(item.getStatus() == null || item.getStatus() == 0 || item.getStatus() == 2);
vo.setCanConvert(item.getStatus() == null || item.getStatus() == 0);
return vo;
}
}

View File

@@ -8,6 +8,7 @@ import java.nio.file.Files;
import java.nio.file.Path;
import java.nio.file.Paths;
import java.util.List;
import java.util.Map;
import java.util.concurrent.TimeUnit;
/**
@@ -41,6 +42,11 @@ public abstract class CommandTask extends BaseTask {
processBuilder.directory(workDir.toFile());
processBuilder.redirectErrorStream(true);
// 设置环境变量强制Python使用UTF-8编码(解决Windows GBK编码问题)
Map<String, String> env = processBuilder.environment();
env.put("PYTHONIOENCODING", "utf-8"); // Python I/O编码
env.put("PYTHONUTF8", "1"); // Python 3.7+ UTF-8模式
// 启动进程
Process process = processBuilder.start();

View File

@@ -18,7 +18,6 @@ public abstract class PythonCommandTask extends CommandTask {
@Autowired
protected CrawlerProperties crawlerProperties;
/**
* 获取Python可执行文件路径
*/
@@ -47,18 +46,16 @@ public abstract class PythonCommandTask extends CommandTask {
/**
* 构建Python命令
*
* 注意: 不使用 cmd /c 或 bash -c直接调用Python可执行文件
* 这样可以避免shell对JSON参数中的引号进行错误处理
* ProcessBuilder可以直接启动exe文件参数会正确传递
*/
@Override
protected List<String> buildCommand(TaskParams taskParams) throws Exception {
List<String> command = new ArrayList<>();
// 检查操作系统
String os = System.getProperty("os.name").toLowerCase();
if (os.contains("win")) {
command.add("cmd");
command.add("/c");
}
// 直接调用Python可执行文件不使用shell
command.add(getPythonPath());
// 添加Python脚本和参数由子类实现

View File

@@ -7,6 +7,7 @@ import org.springframework.stereotype.Component;
import org.xyzh.api.crontab.DataCollectionItemService;
import org.xyzh.common.core.domain.ResultDomain;
import org.xyzh.common.dto.crontab.TbDataCollectionItem;
import org.xyzh.common.utils.IDUtils;
import org.xyzh.crontab.config.CrontabProperties;
import org.xyzh.crontab.pojo.TaskParams;
import org.xyzh.crontab.task.PythonCommandTask;
@@ -17,7 +18,9 @@ import java.nio.file.Paths;
import java.text.SimpleDateFormat;
import java.util.ArrayList;
import java.util.Date;
import java.util.HashMap;
import java.util.List;
import java.util.Map;
/**
* @description 新闻爬虫定时任务
@@ -42,43 +45,58 @@ public class NewsCrawlerTask extends PythonCommandTask {
protected List<String> buildPythonArgs(TaskParams taskParams) throws Exception {
List<String> args = new ArrayList<>();
String methodName = taskParams.getMethodName();
String source = "rmrb";
String category = "politics";
String limit = "20";
// 根据不同的方法名称构建不同的参数
if ("关键字搜索爬取".equals(methodName)) {
String query = taskParams.getParamAsString("query");
Integer total = taskParams.getParamAsInt("total");
category = query != null ? query : "politics";
limit = total != null ? total.toString() : "20";
} else if ("排行榜爬取".equals(methodName)) {
category = "ranking";
} else if ("往日精彩头条爬取".equals(methodName)) {
String startDate = taskParams.getParamAsString("startDate");
String endDate = taskParams.getParamAsString("endDate");
Boolean isYesterday = taskParams.getParamAsBoolean("isYesterday");
category = "history";
// 这里可以将日期参数传递给Python脚本
// 1. 从params获取scriptPath
String scriptPath = taskParams.getParamAsString("scriptPath");
if (scriptPath == null || scriptPath.isEmpty()) {
throw new Exception("scriptPath参数缺失");
}
// 生成输出文件名
// 2. 生成输出文件名
String timestamp = String.valueOf(System.currentTimeMillis());
String outputFile = String.format("output/news_%s_%s_%s.json", source, category, timestamp);
String outputFile = String.format("output/news_%s.json", timestamp);
// 保存输出文件路径到params中供handleResult使用
taskParams.setParam("_outputFile", outputFile);
// 添加脚本和参数
args.add("main.py");
args.add(category);
args.add(limit);
// 4. 构建命令参数
args.add(scriptPath); // 动态脚本路径
// 5. 遍历params动态构建命令行参数
if (taskParams.getParams() != null) {
for (Map.Entry<String, Object> entry : taskParams.getParams().entrySet()) {
String key = entry.getKey();
Object value = entry.getValue();
// 跳过特殊参数
if (key.startsWith("_") || key.equals("scriptPath") ||
key.equals("taskId") || key.equals("logId")) {
continue;
}
// 获取对应的Python参数名
String pythonArg = "--"+key;
if (pythonArg != null && value != null) {
if (value instanceof Boolean) {
// Boolean类型: true时只传参数名false时不传
if ((Boolean) value) {
args.add(pythonArg);
}
} else {
// String/Integer类型: 传参数名+值
args.add(pythonArg);
args.add(value.toString());
}
}
}
}
// 6. 统一添加output参数
args.add("--output");
args.add(outputFile);
logger.info("爬虫参数 - 来源: {}, 分类: {}, 数: {}", source, category, limit);
logger.info("Python脚本: {}, 命令行参数: {}", scriptPath, String.join(" ", args.subList(1, args.size())));
return args;
}
@@ -98,11 +116,12 @@ public class NewsCrawlerTask extends PythonCommandTask {
// 读取并解析结果文件
String jsonContent = Files.readString(outputPath);
List<ArticleStruct> newsList = JSON.parseObject(
jsonContent,
new TypeReference<List<ArticleStruct>>() {}
);
ResultDomain<ArticleStruct> result = JSON.parseObject(jsonContent, new TypeReference<ResultDomain<ArticleStruct>>(){});
if (!result.isSuccess()) {
logger.error("爬取新闻失败: {}", result.getMessage());
return;
}
List<ArticleStruct> newsList = result.getDataList();
logger.info("成功爬取 {} 条新闻", newsList.size());
// 获取taskId和logId
@@ -126,6 +145,8 @@ public class NewsCrawlerTask extends PythonCommandTask {
try {
List<TbDataCollectionItem> itemList = new ArrayList<>();
Date now = new Date();
SimpleDateFormat parser = new SimpleDateFormat("yyyy年MM月dd日HH:mm");
SimpleDateFormat dateFormat = new SimpleDateFormat("yyyy-MM-dd HH:mm:ss");
for (ArticleStruct news : newsList) {
@@ -133,6 +154,7 @@ public class NewsCrawlerTask extends PythonCommandTask {
TbDataCollectionItem item = new TbDataCollectionItem();
// 基本信息
item.setID(IDUtils.generateID());
item.setTaskId(taskId);
item.setLogId(logId);
item.setTitle(news.getTitle());
@@ -156,7 +178,7 @@ public class NewsCrawlerTask extends PythonCommandTask {
String publishTimeStr = news.getPublishTime();
if (publishTimeStr != null && !publishTimeStr.isEmpty()) {
try {
item.setPublishTime(dateFormat.parse(publishTimeStr));
item.setPublishTime(dateFormat.parse(dateFormat.format(parser.parse(publishTimeStr))));
} catch (Exception e) {
logger.warn("解析发布时间失败: {}", publishTimeStr);
item.setPublishTime(now);

View File

@@ -1,28 +0,0 @@
crawler:
python:
path: F:\Environment\Conda\envs\shoolNewsCrewer
base:
path: F:/Project/schoolNews/schoolNewsCrawler
crontab:
items: #可供前端选择的定时任务列表
- name: 人民日报新闻爬取
methods: #爬取方式
- name: 关键字搜索爬取
class: org.xyzh.crontab.task.newsTask.NewsCrawlerTask
path: crawler/RmrbSearch.py
params:
query: String #搜索关键字
total: Integer #总新闻数量
- name: 排行榜爬取
class: org.xyzh.crontab.task.newsTask.NewsCrawlerTask
path: crawler/RmrbHotPoint.py
- name: 往日精彩头条爬取
class: org.xyzh.crontab.task.newsTask.NewsCrawlerTask
path: crawler/RmrbTrending.py
params:
startDate: String #开始日期
endDate: String #结束日期
isYestoday: Boolean #是否是昨天

View File

@@ -0,0 +1,47 @@
crawler:
# Python 可执行文件路径Windows 建议指向 python.exe如已在 PATH可直接用 "python"
pythonPath: F:/Environment/Conda/envs/schoolNewsCrawler/python.exe
# 爬虫脚本根目录NewsCrawlerTask 的工作目录)
basePath: F:/Project/schoolNews/schoolNewsCrawler
# 下面为原有的定时任务清单(保持不变,仅修正到正确文件)
crontab:
items:
- name: 人民日报新闻爬取
methods:
- name: 关键字搜索爬取
clazz: newsCrewerTask
excuete_method: execute
path: crawler/RmrbSearch.py
params:
- name: query
description: 搜索关键字
type: String
value: ""
- name: total
description: 总新闻数量
type: Integer
value: 10
- name: 排行榜爬取
clazz: newsCrewerTask
excuete_method: execute
path: crawler/RmrbHotPoint.py
- name: 往日精彩头条爬取
clazz: newsCrewerTask
excuete_method: execute
path: crawler/RmrbTrending.py
params:
- name: startDate
description: 开始日期
type: String
value: ""
- name: endDate
description: 结束日期
type: String
value: ""
- name: yesterday
description: 是否是昨天
type: Boolean
value: true

View File

@@ -186,7 +186,7 @@
UPDATE tb_crontab_task
SET deleted = 1,
delete_time = NOW()
WHERE id = #{taskId} AND deleted = 0
WHERE task_id=#{taskId} AND deleted = 0
</update>
<!-- 根据ID查询任务 -->
@@ -194,7 +194,7 @@
SELECT
<include refid="Base_Column_List" />
FROM tb_crontab_task
WHERE id = #{taskId} AND deleted = 0
WHERE task_id=#{taskId} AND deleted = 0
</select>
<!-- 根据过滤条件查询任务列表 -->
@@ -272,7 +272,7 @@
UPDATE tb_crontab_task
SET status = #{status},
update_time = NOW()
WHERE id = #{taskId} AND deleted = 0
WHERE task_id=#{taskId} AND deleted = 0
</update>
<!-- 根据Bean名称和方法名称查询任务 -->

View File

@@ -0,0 +1,400 @@
<?xml version="1.0" encoding="UTF-8"?>
<!DOCTYPE mapper
PUBLIC "-//mybatis.org//DTD Mapper 3.0//EN"
"http://mybatis.org/dtd/mybatis-3-mapper.dtd">
<mapper namespace="org.xyzh.crontab.mapper.DataCollectionItemMapper">
<!-- 结果映射 -->
<resultMap id="BaseResultMap" type="org.xyzh.common.dto.crontab.TbDataCollectionItem">
<id column="id" property="id" />
<result column="task_id" property="taskId" />
<result column="log_id" property="logId" />
<result column="title" property="title" />
<result column="content" property="content" />
<result column="summary" property="summary" />
<result column="source" property="source" />
<result column="source_url" property="sourceUrl" />
<result column="category" property="category" />
<result column="author" property="author" />
<result column="publish_time" property="publishTime" />
<result column="cover_image" property="coverImage" />
<result column="images" property="images" />
<result column="tags" property="tags" />
<result column="status" property="status" />
<result column="resource_id" property="resourceId" />
<result column="crawl_time" property="crawlTime" />
<result column="process_time" property="processTime" />
<result column="processor" property="processor" />
<result column="create_time" property="createTime" />
<result column="update_time" property="updateTime" />
<result column="delete_time" property="deleteTime" />
<result column="deleted" property="deleted" />
</resultMap>
<!-- VO结果映射(平铺结构,包含关联的任务和日志信息) -->
<resultMap id="VOResultMap" type="org.xyzh.common.vo.DataCollectionItemVO">
<!-- 采集项基本信息 -->
<result column="item_id" property="id" />
<result column="task_id" property="taskId" />
<result column="log_id" property="logId" />
<result column="title" property="title" />
<result column="content" property="content" />
<result column="summary" property="summary" />
<result column="source" property="source" />
<result column="source_url" property="sourceUrl" />
<result column="category" property="category" />
<result column="author" property="author" />
<result column="publish_time" property="publishTime" />
<result column="cover_image" property="coverImage" />
<result column="images" property="images" />
<result column="tags" property="tags" />
<result column="status" property="status" />
<result column="resource_id" property="resourceId" />
<result column="crawl_time" property="crawlTime" />
<result column="process_time" property="processTime" />
<result column="processor" property="processor" />
<result column="item_create_time" property="createTime" />
<result column="item_update_time" property="updateTime" />
<!-- 关联的任务信息 -->
<result column="task_name" property="taskName" />
<result column="task_group" property="taskGroup" />
<result column="bean_name" property="beanName" />
<result column="method_name" property="methodName" />
<result column="method_params" property="methodParams" />
<!-- 关联的日志信息 -->
<result column="execute_status" property="executeStatus" />
<result column="execute_duration" property="executeDuration" />
<result column="start_time" property="startTime" />
<result column="end_time" property="endTime" />
</resultMap>
<!-- 字段列表 -->
<sql id="Base_Column_List">
id, task_id, log_id, title, content, summary, source, source_url, category, author,
publish_time, cover_image, images, tags, status, resource_id, crawl_time, process_time,
processor, create_time, update_time, delete_time, deleted
</sql>
<!-- VO查询字段列表(包含关联表) -->
<sql id="VO_Column_List">
i.id as item_id,
i.task_id,
i.log_id,
i.title,
i.content,
i.summary,
i.source,
i.source_url,
i.category,
i.author,
i.publish_time,
i.cover_image,
i.images,
i.tags,
i.status,
i.resource_id,
i.crawl_time,
i.process_time,
i.processor,
i.create_time as item_create_time,
i.update_time as item_update_time,
t.task_name,
t.task_group,
t.bean_name,
t.method_name,
t.method_params,
l.execute_status,
l.execute_duration,
l.start_time,
l.end_time
</sql>
<!-- 动态查询条件(用于有@Param("filter")的方法) -->
<sql id="Filter_Where_Clause">
<where>
deleted = 0
<if test="filter != null">
<if test="filter.id != null and filter.id != ''">
AND id = #{filter.id}
</if>
<if test="filter.taskId != null and filter.taskId != ''">
AND task_id = #{filter.taskId}
</if>
<if test="filter.logId != null and filter.logId != ''">
AND log_id = #{filter.logId}
</if>
<if test="filter.title != null and filter.title != ''">
AND title LIKE CONCAT('%', #{filter.title}, '%')
</if>
<if test="filter.source != null and filter.source != ''">
AND source = #{filter.source}
</if>
<if test="filter.sourceUrl != null and filter.sourceUrl != ''">
AND source_url = #{filter.sourceUrl}
</if>
<if test="filter.category != null and filter.category != ''">
AND category = #{filter.category}
</if>
<if test="filter.author != null and filter.author != ''">
AND author LIKE CONCAT('%', #{filter.author}, '%')
</if>
<if test="filter.status != null">
AND status = #{filter.status}
</if>
<if test="filter.resourceId != null and filter.resourceId != ''">
AND resource_id = #{filter.resourceId}
</if>
<if test="filter.processor != null and filter.processor != ''">
AND processor = #{filter.processor}
</if>
</if>
</where>
</sql>
<!-- 动态查询条件(用于没有@Param注解的方法直接使用参数名 -->
<sql id="Item_Where_Clause">
<where>
deleted = 0
<if test="_parameter != null">
<if test="id != null and id != ''">
AND id = #{id}
</if>
<if test="taskId != null and taskId != ''">
AND task_id = #{taskId}
</if>
<if test="logId != null and logId != ''">
AND log_id = #{logId}
</if>
<if test="title != null and title != ''">
AND title LIKE CONCAT('%', #{title}, '%')
</if>
<if test="source != null and source != ''">
AND source = #{source}
</if>
<if test="sourceUrl != null and sourceUrl != ''">
AND source_url = #{sourceUrl}
</if>
<if test="category != null and category != ''">
AND category = #{category}
</if>
<if test="author != null and author != ''">
AND author LIKE CONCAT('%', #{author}, '%')
</if>
<if test="status != null">
AND status = #{status}
</if>
<if test="resourceId != null and resourceId != ''">
AND resource_id = #{resourceId}
</if>
<if test="processor != null and processor != ''">
AND processor = #{processor}
</if>
</if>
</where>
</sql>
<!-- 根据来源URL查询采集项用于去重 -->
<select id="selectBySourceUrl" resultMap="BaseResultMap">
SELECT
<include refid="Base_Column_List" />
FROM tb_data_collection_item
WHERE source_url = #{sourceUrl}
AND deleted = 0
LIMIT 1
</select>
<!-- 根据任务ID查询采集项列表 -->
<select id="selectByTaskId" resultMap="BaseResultMap">
SELECT
<include refid="Base_Column_List" />
FROM tb_data_collection_item
WHERE task_id = #{taskId}
AND deleted = 0
ORDER BY create_time DESC
</select>
<!-- 查询采集项列表 -->
<select id="selectItemList" resultMap="BaseResultMap">
SELECT
<include refid="Base_Column_List" />
FROM tb_data_collection_item
<include refid="Item_Where_Clause" />
ORDER BY create_time DESC
</select>
<!-- 分页查询采集项列表 -->
<select id="selectItemPage" resultMap="BaseResultMap">
SELECT
<include refid="Base_Column_List" />
FROM tb_data_collection_item
<include refid="Filter_Where_Clause" />
ORDER BY create_time DESC
LIMIT #{pageParam.pageSize} OFFSET #{pageParam.offset}
</select>
<!-- 统计采集项总数 -->
<select id="countItems" resultType="long">
SELECT COUNT(*)
FROM tb_data_collection_item
<include refid="Filter_Where_Clause" />
</select>
<!-- 根据状态统计数量 -->
<select id="countByStatus" resultType="long">
SELECT COUNT(*)
FROM tb_data_collection_item
WHERE deleted = 0
<if test="taskId != null and taskId != ''">
AND task_id = #{taskId}
</if>
<if test="status != null">
AND status = #{status}
</if>
</select>
<!-- 批量插入采集项 -->
<insert id="batchInsertItems">
INSERT INTO tb_data_collection_item (
id, task_id, log_id, title, content, summary, source, source_url,
category, author, publish_time, cover_image, images, tags, status,
resource_id, crawl_time, process_time, processor,
create_time, update_time, deleted
)
VALUES
<foreach collection="itemList" item="item" separator=",">
(
#{item.id}, #{item.taskId}, #{item.logId}, #{item.title}, #{item.content},
#{item.summary}, #{item.source}, #{item.sourceUrl}, #{item.category},
#{item.author}, #{item.publishTime}, #{item.coverImage}, #{item.images},
#{item.tags}, #{item.status}, #{item.resourceId}, #{item.crawlTime},
#{item.processTime}, #{item.processor},
NOW(), NOW(), 0
)
</foreach>
</insert>
<!-- ==================== VO查询方法(使用JOIN返回完整VO) ==================== -->
<!-- 根据ID查询采集项VO -->
<select id="selectVOById" resultMap="VOResultMap">
SELECT
<include refid="VO_Column_List" />
FROM tb_data_collection_item i
LEFT JOIN tb_crontab_task t ON i.task_id = t.task_id
LEFT JOIN tb_crontab_log l ON i.log_id = l.id
WHERE i.id = #{itemId}
AND i.deleted = 0
</select>
<!-- 查询采集项VO列表 -->
<select id="selectVOList" resultMap="VOResultMap">
SELECT
<include refid="VO_Column_List" />
FROM tb_data_collection_item i
LEFT JOIN tb_crontab_task t ON i.task_id = t.task_id
LEFT JOIN tb_crontab_log l ON i.log_id = l.id
<where>
i.deleted = 0
<if test="_parameter != null">
<if test="id != null and id != ''">
AND i.id = #{id}
</if>
<if test="taskId != null and taskId != ''">
AND i.task_id = #{taskId}
</if>
<if test="logId != null and logId != ''">
AND i.log_id = #{logId}
</if>
<if test="title != null and title != ''">
AND i.title LIKE CONCAT('%', #{title}, '%')
</if>
<if test="source != null and source != ''">
AND i.source = #{source}
</if>
<if test="sourceUrl != null and sourceUrl != ''">
AND i.source_url = #{sourceUrl}
</if>
<if test="category != null and category != ''">
AND i.category = #{category}
</if>
<if test="author != null and author != ''">
AND i.author LIKE CONCAT('%', #{author}, '%')
</if>
<if test="status != null">
AND i.status = #{status}
</if>
<if test="resourceId != null and resourceId != ''">
AND i.resource_id = #{resourceId}
</if>
<if test="processor != null and processor != ''">
AND i.processor = #{processor}
</if>
</if>
</where>
ORDER BY i.create_time DESC
</select>
<!-- 分页查询采集项VO列表 -->
<select id="selectVOPage" resultMap="VOResultMap">
SELECT
<include refid="VO_Column_List" />
FROM tb_data_collection_item i
LEFT JOIN tb_crontab_task t ON i.task_id = t.task_id
LEFT JOIN tb_crontab_log l ON i.log_id = l.id
<where>
i.deleted = 0
<if test="filter != null">
<if test="filter.id != null and filter.id != ''">
AND i.id = #{filter.id}
</if>
<if test="filter.taskId != null and filter.taskId != ''">
AND i.task_id = #{filter.taskId}
</if>
<if test="filter.logId != null and filter.logId != ''">
AND i.log_id = #{filter.logId}
</if>
<if test="filter.title != null and filter.title != ''">
AND i.title LIKE CONCAT('%', #{filter.title}, '%')
</if>
<if test="filter.source != null and filter.source != ''">
AND i.source = #{filter.source}
</if>
<if test="filter.sourceUrl != null and filter.sourceUrl != ''">
AND i.source_url = #{filter.sourceUrl}
</if>
<if test="filter.category != null and filter.category != ''">
AND i.category = #{filter.category}
</if>
<if test="filter.author != null and filter.author != ''">
AND i.author LIKE CONCAT('%', #{filter.author}, '%')
</if>
<if test="filter.status != null">
AND i.status = #{filter.status}
</if>
<if test="filter.resourceId != null and filter.resourceId != ''">
AND i.resource_id = #{filter.resourceId}
</if>
<if test="filter.processor != null and filter.processor != ''">
AND i.processor = #{filter.processor}
</if>
</if>
</where>
ORDER BY i.create_time DESC
LIMIT #{pageParam.pageSize} OFFSET #{pageParam.offset}
</select>
<!-- 根据任务ID查询采集项VO列表 -->
<select id="selectVOByTaskId" resultMap="VOResultMap">
SELECT
<include refid="VO_Column_List" />
FROM tb_data_collection_item i
LEFT JOIN tb_crontab_task t ON i.task_id = t.task_id
LEFT JOIN tb_crontab_log l ON i.log_id = l.id
WHERE i.task_id = #{taskId}
AND i.deleted = 0
ORDER BY i.create_time DESC
</select>
</mapper>

View File

@@ -26,6 +26,7 @@ import org.xyzh.common.vo.UserDeptRoleVO;
import org.xyzh.common.core.enums.ResourceType;
import java.util.ArrayList;
import java.util.Arrays;
import java.util.Date;
import java.util.List;
import java.util.stream.Collectors;
@@ -270,7 +271,7 @@ public class NCResourceServiceImpl implements ResourceService {
}
// 检查资源是否存在
TbResource existing = resourceMapper.selectById(resource.getResourceID());
TbResource existing = resourceMapper.selectByResourceId(resource.getResourceID());
if (existing == null || existing.getDeleted()) {
resultDomain.fail("资源不存在");
return resultDomain;
@@ -286,33 +287,17 @@ public class NCResourceServiceImpl implements ResourceService {
}
}
Date now = new Date();
// 原始tags
TbResourceTag filter = new TbResourceTag();
filter.setResourceID(resource.getResourceID());
List<TagVO> originalTagVOs = resourceTagMapper.selectResourceTags(filter);
List<TbResourceTag> originalTags = originalTagVOs.stream().map(TagVO::getResourceTag).collect(Collectors.toList());
// 当前tags
List<TbTag> currentTags = resourceVO.getTags();
// 新增tags
List<TbTag> tagsToAdd = currentTags.stream()
.filter(tag -> originalTags.stream().noneMatch(originalTag -> originalTag.getTagID().equals(tag.getID())))
.collect(Collectors.toList());
// 删除tags
List<TbResourceTag> tagsToDelete = originalTags.stream()
.filter(originalTag -> currentTags.stream().noneMatch(tag -> tag.getID().equals(originalTag.getTagID())))
.collect(Collectors.toList());
resourceTagMapper.batchDeleteResourceTags(tagsToDelete.stream().map(TbResourceTag::getID).collect(Collectors.toList()));
resourceTagMapper.batchInsertResourceTags(tagsToAdd.stream().map(tag -> {
// tag先删后增
TbResourceTag resourceTag = new TbResourceTag();
resourceTag.setResourceID(resource.getResourceID());
resourceTag.setTagID(tag.getID());
resourceTag.setID(IDUtils.generateID());
resourceTag.setResourceID(resource.getResourceID());
resourceTag.setCreator(user.getID());
resourceTag.setCreateTime(now);
return resourceTag;
}).collect(Collectors.toList()));
resourceTag.setDeleted(false);
resourceTag.setTagID(resourceVO.getResource().getTagID());
resourceTagMapper.deleteByResourceId(resource.getResourceID());
resourceTagMapper.batchInsertResourceTags(Arrays.asList(resourceTag));
// 更新时间
resource.setUpdateTime(now);
@@ -321,10 +306,10 @@ public class NCResourceServiceImpl implements ResourceService {
if (result > 0) {
logger.info("更新资源成功: {}", resource.getResourceID());
// 重新查询返回完整数据
TbResource updated = resourceMapper.selectById(resource.getResourceID());
TbResource updated = resourceMapper.selectByResourceId(resource.getResourceID());
ResourceVO updatedResourceVO = new ResourceVO();
updatedResourceVO.setResource(updated);
updatedResourceVO.setTags(currentTags);
updatedResourceVO.setTags(resourceVO.getTags());
resultDomain.success("更新资源成功", updatedResourceVO);
return resultDomain;
} else {
@@ -403,7 +388,7 @@ public class NCResourceServiceImpl implements ResourceService {
if (result > 0) {
logger.info("更新资源状态成功: {}", resourceID);
// 重新查询返回完整数据
TbResource updated = resourceMapper.selectById(resource.getID());
TbResource updated = resourceMapper.selectByResourceId(resource.getID());
resultDomain.success("更新资源状态成功", updated);
return resultDomain;
} else {
@@ -553,7 +538,7 @@ public class NCResourceServiceImpl implements ResourceService {
if (result > 0) {
logger.info("增加资源点赞次数成功: {}", resourceID);
// 重新查询返回完整数据
TbResource updated = resourceMapper.selectById(resource.getID());
TbResource updated = resourceMapper.selectByResourceId(resource.getID());
resultDomain.success("增加点赞次数成功", updated);
return resultDomain;
} else {
@@ -625,7 +610,7 @@ public class NCResourceServiceImpl implements ResourceService {
if (result > 0) {
logger.info("设置资源推荐状态成功: {} -> {}", resourceID, isRecommend);
// 重新查询返回完整数据
TbResource updated = resourceMapper.selectById(resource.getID());
TbResource updated = resourceMapper.selectByResourceId(resource.getID());
resultDomain.success("设置推荐状态成功", updated);
return resultDomain;
} else {
@@ -669,7 +654,7 @@ public class NCResourceServiceImpl implements ResourceService {
if (result > 0) {
logger.info("设置资源轮播状态成功: {} -> {}", resourceID, isBanner);
// 重新查询返回完整数据
TbResource updated = resourceMapper.selectById(resource.getID());
TbResource updated = resourceMapper.selectByResourceId(resource.getID());
resultDomain.success("设置轮播状态成功", updated);
return resultDomain;
} else {

View File

@@ -5,7 +5,7 @@
*/
import { api } from '@/apis/index';
import type { CrontabTask, CrontabLog, ResultDomain, PageParam } from '@/types';
import type { CrontabTask, CrontabLog, DataCollectionItem, CrontabItem, ResultDomain, PageParam } from '@/types';
/**
* 定时任务API服务
@@ -15,13 +15,22 @@ export const crontabApi = {
// ==================== 定时任务管理 ====================
/**
* 获取可创建的定时任务模板列表
* @returns Promise<ResultDomain<CrontabItem>>
*/
async getEnabledCrontabList(): Promise<ResultDomain<CrontabItem>> {
const response = await api.get<CrontabItem>(`${this.baseUrl}/getEnabledCrontabList`);
return response.data;
},
/**
* 创建定时任务
* @param task 任务对象
* @returns Promise<ResultDomain<CrontabTask>>
*/
async createTask(task: CrontabTask): Promise<ResultDomain<CrontabTask>> {
const response = await api.post<CrontabTask>(`${this.baseUrl}/task`, task);
const response = await api.post<CrontabTask>(`${this.baseUrl}/crontabTask`, task);
return response.data;
},
@@ -31,7 +40,7 @@ export const crontabApi = {
* @returns Promise<ResultDomain<CrontabTask>>
*/
async updateTask(task: CrontabTask): Promise<ResultDomain<CrontabTask>> {
const response = await api.put<CrontabTask>(`${this.baseUrl}/task`, task);
const response = await api.put<CrontabTask>(`${this.baseUrl}/crontabTask`, task);
return response.data;
},
@@ -41,7 +50,7 @@ export const crontabApi = {
* @returns Promise<ResultDomain<CrontabTask>>
*/
async deleteTask(task: CrontabTask): Promise<ResultDomain<CrontabTask>> {
const response = await api.delete<CrontabTask>(`${this.baseUrl}/task`, task);
const response = await api.delete<CrontabTask>(`${this.baseUrl}/crontabTask`, task);
return response.data;
},
@@ -72,11 +81,11 @@ export const crontabApi = {
* @returns Promise<ResultDomain<CrontabTask>>
*/
async getTaskPage(filter?: Partial<CrontabTask>, pageParam?: PageParam): Promise<ResultDomain<CrontabTask>> {
const response = await api.post<CrontabTask>(`${this.baseUrl}/task/page`, {
const response = await api.post<CrontabTask>(`${this.baseUrl}/crontabTaskPage`, {
filter,
pageParam: {
pageNumber: pageParam?.page || 1,
pageSize: pageParam?.size || 10
pageNumber: pageParam?.pageNumber || 1,
pageSize: pageParam?.pageSize || 10
}
});
return response.data;
@@ -153,11 +162,11 @@ export const crontabApi = {
* @returns Promise<ResultDomain<CrontabLog>>
*/
async getLogPage(filter?: Partial<CrontabLog>, pageParam?: PageParam): Promise<ResultDomain<CrontabLog>> {
const response = await api.post<CrontabLog>(`${this.baseUrl}/log/page`, {
const response = await api.post<CrontabLog>(`${this.baseUrl}/crontabTaskLogPage`, {
filter,
pageParam: {
pageNumber: pageParam?.page || 1,
pageSize: pageParam?.size || 10
pageNumber: pageParam?.pageNumber || 1,
pageSize: pageParam?.pageSize || 10
}
});
return response.data;
@@ -191,6 +200,49 @@ export const crontabApi = {
async deleteLog(log: CrontabLog): Promise<ResultDomain<CrontabLog>> {
const response = await api.delete<CrontabLog>(`${this.baseUrl}/log`, log);
return response.data;
},
// ==================== 数据采集项管理 ====================
/**
* 根据任务日志ID查询数据采集项列表
* @param taskLogId 任务日志ID
* @returns Promise<ResultDomain<DataCollectionItem>>
*/
async getCollectionItemsByLogId(taskLogId: string): Promise<ResultDomain<DataCollectionItem>> {
const response = await api.get<DataCollectionItem>(`${this.baseUrl}/collection/item/task/${taskLogId}`);
return response.data;
},
/**
* 分页查询数据采集项列表
* @param filter 过滤条件
* @param pageParam 分页参数
* @returns Promise<ResultDomain<DataCollectionItem>>
*/
async getCollectionItemPage(filter?: Partial<DataCollectionItem>, pageParam?: PageParam): Promise<ResultDomain<DataCollectionItem>> {
const response = await api.post<DataCollectionItem>(`${this.baseUrl}/collection/item/page`, {
filter,
pageParam: {
pageNumber: pageParam?.pageNumber || 1,
pageSize: pageParam?.pageSize || 10
}
});
return response.data;
},
/**
* 转换采集项为资源文章
* @param itemId 采集项ID
* @param tagId 标签ID
* @returns Promise<ResultDomain<string>>
*/
async convertItemToResource(itemId: string, tagId: string): Promise<ResultDomain<string>> {
const response = await api.post<string>(`${this.baseUrl}/collection/item/resource`, {
itemId,
tagId
});
return response.data;
}
};

View File

@@ -243,7 +243,7 @@ watch(
background: white;
border-radius: 4px;
box-shadow: 0 1px 4px rgba(0, 21, 41, 0.08);
height: calc(100vh - 76px);
min-height: calc(100vh - 76px);
}
</style>

View File

@@ -210,7 +210,7 @@ function handleMenuClick(menu: SysMenu) {
.main-content-full {
background: #F9FAFB;
height: 100vh;
min-height: 100vh;
overflow-y: auto;
padding: 20px;
box-sizing: border-box;

View File

@@ -42,6 +42,8 @@ export interface CrontabTask extends BaseDTO {
* 定时任务执行日志
*/
export interface CrontabLog extends BaseDTO {
/** 日志ID */
logId?: string;
/** 任务ID */
taskId?: string;
/** 任务名称 */
@@ -90,3 +92,93 @@ export interface NewsCrawlerConfig {
status?: number;
}
/**
* 数据采集项
*/
export interface DataCollectionItem extends BaseDTO {
/** 采集项ID */
itemId?: string;
/** 日志ID */
logId?: string;
/** 任务ID */
taskId?: string;
/** 任务名称 */
taskName?: string;
/** 标题 */
title?: string;
/** 内容HTML格式 */
content?: string;
/** 来源URL */
sourceUrl?: string;
/** 发布时间 */
publishTime?: string;
/** 作者 */
author?: string;
/** 摘要 */
summary?: string;
/** 封面图片 */
coverImage?: string;
/** 分类 */
category?: string;
/** 来源(人民日报、新华社等) */
source?: string;
/** 标签(多个用逗号分隔) */
tags?: string;
/** 图片列表JSON格式 */
images?: string;
/** 状态0:未处理 1:已转换 2:已忽略) */
status?: number;
/** 转换时间 */
convertTime?: string;
/** 转换后的资源ID */
resourceId?: string;
/** 错误信息 */
errorMessage?: string;
/** 爬取时间 */
crawlTime?: string;
/** 处理时间 */
processTime?: string;
/** 处理人 */
processor?: string;
}
/**
* 爬虫任务参数
*/
export interface CrontabParam {
/** 参数名称 */
name: string;
/** 参数描述 */
description: string;
/** 参数类型 */
type: string;
/** 默认值 */
value: any;
}
/**
* 爬虫任务模板方法
*/
export interface CrontabMethod {
/** 方法名称 */
name: string;
/** Bean类名 */
clazz?: string;
/** 执行方法名 */
excuete_method?: string;
/** Python脚本路径 */
path: string;
/** 参数定义列表 */
params?: CrontabParam[];
}
/**
* 爬虫任务模板项
*/
export interface CrontabItem {
/** 模板名称 */
name: string;
/** 可用方法列表 */
methods: CrontabMethod[];
}

View File

@@ -118,9 +118,18 @@
<el-dialog
v-model="detailDialogVisible"
title="执行日志详情"
width="700px"
width="900px"
:close-on-click-modal="false"
>
<div class="detail-content" v-if="currentLog">
<!-- 日志基本信息 -->
<el-card class="detail-card" shadow="never">
<template #header>
<div class="card-header-title">
<span>执行信息</span>
</div>
</template>
<div class="detail-grid">
<div class="detail-item">
<span class="detail-label">任务名称</span>
<span class="detail-value">{{ currentLog.taskName }}</span>
@@ -159,15 +168,98 @@
<span class="detail-label">结束时间</span>
<span class="detail-value">{{ currentLog.endTime }}</span>
</div>
<div class="detail-item" v-if="currentLog.executeMessage">
<div class="detail-item full-width" v-if="currentLog.executeMessage">
<span class="detail-label">执行结果</span>
<div class="detail-message">{{ currentLog.executeMessage }}</div>
</div>
<div class="detail-item" v-if="currentLog.exceptionInfo">
<div class="detail-item full-width" v-if="currentLog.exceptionInfo">
<span class="detail-label">异常信息</span>
<div class="detail-exception">{{ currentLog.exceptionInfo }}</div>
</div>
</div>
</el-card>
<!-- 采集的新闻数据 -->
<el-card class="detail-card" shadow="never" style="margin-top: 20px">
<template #header>
<div class="card-header-title">
<span>采集数据</span>
<el-tag size="small" type="info"> {{ collectionItems.length }} </el-tag>
</div>
</template>
<div v-loading="loadingItems">
<!-- 无数据提示 -->
<el-empty
v-if="!loadingItems && collectionItems.length === 0"
description="暂无采集数据"
:image-size="80"
/>
<!-- 新闻列表 -->
<div v-else class="news-list">
<div
v-for="(item, index) in collectionItems"
:key="item.id"
class="news-item"
>
<div class="news-header">
<span class="news-index">#{{ index + 1 }}</span>
<el-tag
v-if="item.status === 0"
type="info"
size="small"
>
未处理
</el-tag>
<el-tag
v-else-if="item.status === 1"
type="success"
size="small"
>
已转换
</el-tag>
<el-tag
v-else
type="warning"
size="small"
>
已忽略
</el-tag>
</div>
<h4 class="news-title">{{ item.title }}</h4>
<div class="news-meta">
<span v-if="item.source">来源: {{ item.source }}</span>
<span v-if="item.author">作者: {{ item.author }}</span>
<span v-if="item.publishTime">发布: {{ item.publishTime }}</span>
<span v-if="item.category">分类: {{ item.category }}</span>
</div>
<div v-if="item.summary" class="news-summary">
{{ item.summary }}
</div>
<div class="news-footer">
<el-link
v-if="item.sourceUrl"
:href="item.sourceUrl"
target="_blank"
type="primary"
:underline="false"
>
查看原文
</el-link>
<span v-if="item.crawlTime" class="crawl-time">
采集时间: {{ item.crawlTime }}
</span>
</div>
</div>
</div>
</div>
</el-card>
</div>
<template #footer>
<el-button @click="detailDialogVisible = false">关闭</el-button>
@@ -222,7 +314,7 @@ import { ref, reactive, onMounted } from 'vue';
import { ElMessage, ElMessageBox } from 'element-plus';
import { Delete, Search, Refresh } from '@element-plus/icons-vue';
import { crontabApi } from '@/apis/crontab';
import type { CrontabLog, PageParam } from '@/types';
import type { CrontabLog, PageParam, DataCollectionItem } from '@/types';
import { AdminLayout } from '@/views/admin';
defineOptions({
name: 'LogManagementView'
@@ -233,6 +325,8 @@ const submitting = ref(false);
const logList = ref<CrontabLog[]>([]);
const total = ref(0);
const currentLog = ref<CrontabLog | null>(null);
const collectionItems = ref<DataCollectionItem[]>([]);
const loadingItems = ref(false);
// 搜索表单
const searchForm = reactive({
@@ -262,9 +356,18 @@ async function loadLogList() {
if (searchForm.executeStatus !== undefined) filter.executeStatus = searchForm.executeStatus;
const result = await crontabApi.getLogPage(filter, pageParam);
if (result.success && result.dataList) {
if (result.success) {
// 根据后端返回结构处理数据
if (result.pageDomain) {
logList.value = result.pageDomain.dataList || [];
total.value = result.pageDomain.pageParam?.totalElements || 0;
} else if (result.dataList) {
logList.value = result.dataList;
total.value = result.pageParam?.totalElements || 0;
} else {
logList.value = [];
total.value = 0;
}
} else {
ElMessage.error(result.message || '加载日志列表失败');
logList.value = [];
@@ -310,16 +413,36 @@ function handleSizeChange(size: number) {
// 查看详情
async function handleViewDetail(row: CrontabLog) {
try {
const result = await crontabApi.getLogById(row.id!);
if (result.success && result.data) {
currentLog.value = result.data;
detailDialogVisible.value = true;
// 同时加载日志详情和采集项数据
loadingItems.value = true;
collectionItems.value = [];
const [logResult, itemsResult] = await Promise.all([
crontabApi.getLogById(row.id!),
crontabApi.getCollectionItemsByLogId(row.id!)
]);
if (logResult.success && logResult.data) {
currentLog.value = logResult.data;
} else {
ElMessage.error(result.message || '获取详情失败');
ElMessage.error(logResult.message || '获取日志详情失败');
return;
}
if (itemsResult.success) {
collectionItems.value = itemsResult.dataList || [];
} else {
console.warn('获取采集项失败:', itemsResult.message);
// 即使采集项加载失败,也显示日志详情
collectionItems.value = [];
}
detailDialogVisible.value = true;
} catch (error) {
console.error('获取日志详情失败:', error);
ElMessage.error('获取日志详情失败');
} finally {
loadingItems.value = false;
}
}
@@ -432,12 +555,38 @@ onMounted(() => {
}
.detail-content {
.detail-card {
margin-bottom: 20px;
.card-header-title {
display: flex;
justify-content: space-between;
align-items: center;
font-weight: 600;
font-size: 16px;
color: #303133;
}
}
.detail-grid {
display: grid;
grid-template-columns: repeat(2, 1fr);
gap: 16px;
.detail-item {
display: flex;
align-items: flex-start;
margin-bottom: 16px;
font-size: 14px;
&.full-width {
grid-column: 1 / -1;
flex-direction: column;
.detail-label {
margin-bottom: 8px;
}
}
.detail-label {
min-width: 100px;
color: #606266;
@@ -452,7 +601,7 @@ onMounted(() => {
.detail-message,
.detail-exception {
flex: 1;
width: 100%;
padding: 12px;
background-color: #f5f7fa;
border-radius: 4px;
@@ -472,6 +621,103 @@ onMounted(() => {
}
}
.news-list {
max-height: 500px;
overflow-y: auto;
.news-item {
padding: 16px;
margin-bottom: 16px;
background-color: #f8f9fa;
border-radius: 8px;
border-left: 4px solid #409eff;
transition: all 0.3s;
&:hover {
background-color: #ecf5ff;
box-shadow: 0 2px 8px rgba(0, 0, 0, 0.08);
}
&:last-child {
margin-bottom: 0;
}
.news-header {
display: flex;
align-items: center;
gap: 12px;
margin-bottom: 12px;
.news-index {
font-size: 12px;
font-weight: 600;
color: #409eff;
background-color: #ecf5ff;
padding: 2px 8px;
border-radius: 4px;
}
}
.news-title {
margin: 0 0 12px 0;
font-size: 16px;
font-weight: 600;
color: #303133;
line-height: 1.5;
}
.news-meta {
display: flex;
flex-wrap: wrap;
gap: 16px;
margin-bottom: 12px;
font-size: 13px;
color: #909399;
span {
display: inline-flex;
align-items: center;
&:not(:last-child)::after {
content: '|';
margin-left: 16px;
color: #dcdfe6;
}
}
}
.news-summary {
margin-bottom: 12px;
padding: 12px;
background-color: #fff;
border-radius: 4px;
font-size: 14px;
color: #606266;
line-height: 1.6;
max-height: 80px;
overflow: hidden;
text-overflow: ellipsis;
display: -webkit-box;
-webkit-line-clamp: 3;
-webkit-box-orient: vertical;
}
.news-footer {
display: flex;
justify-content: space-between;
align-items: center;
padding-top: 12px;
border-top: 1px solid #e4e7ed;
.crawl-time {
font-size: 12px;
color: #909399;
}
}
}
}
}
.clean-dialog-content {
.clean-item {
display: flex;

View File

@@ -55,9 +55,14 @@
</el-button>
</div>
</div>
<!-- 空状态 -->
<el-empty
v-if="!loading && crawlerList.length === 0"
description="暂无爬虫配置"
style="margin-top: 40px"
/>
<!-- 爬虫配置列表 -->
<div class="crawler-list">
<div v-else class="crawler-list">
<el-row :gutter="20">
<el-col :span="8" v-for="crawler in crawlerList" :key="crawler.taskId">
<el-card class="crawler-card" shadow="hover">
@@ -146,13 +151,6 @@
</el-row>
</div>
<!-- 空状态 -->
<el-empty
v-if="!loading && crawlerList.length === 0"
description="暂无爬虫配置"
style="margin-top: 40px"
/>
<!-- 分页 -->
<div class="pagination-container" v-if="total > 0">
<el-pagination
@@ -182,35 +180,81 @@
clearable
/>
</div>
<!-- 爬虫模板选择 -->
<div class="form-item">
<span class="form-label required">Bean名称</span>
<el-input
v-model="formData.beanName"
placeholder="请输入Spring Bean名称newsCrawlerTask"
clearable
/>
</div>
<div class="form-item">
<span class="form-label required">方法名称</span>
<el-input
v-model="formData.methodName"
placeholder="请输入要执行的方法名crawlNews"
clearable
/>
</div>
<div class="form-item">
<span class="form-label">方法参数</span>
<el-input
v-model="formData.methodParams"
type="textarea"
:rows="3"
placeholder="请输入方法参数JSON格式可选"
clearable
<span class="form-label required">爬虫模板</span>
<el-select
v-model="selectedTemplate"
placeholder="请选择爬虫模板"
style="width: 100%"
>
<el-option
v-for="template in crawlerTemplates"
:key="template.name"
:label="template.name"
:value="template"
/>
</el-select>
<span class="form-tip">
示例{"source":"xinhua","category":"education"}
选择要使用的新闻爬虫类型
</span>
</div>
<!-- 爬取方法选择 -->
<div class="form-item" v-if="selectedTemplate">
<span class="form-label required">爬取方法</span>
<el-select
v-model="selectedMethod"
placeholder="请选择爬取方法"
style="width: 100%"
>
<el-option
v-for="method in selectedTemplate.methods"
:key="method.name"
:label="method.name"
:value="method"
/>
</el-select>
<span class="form-tip">
选择具体的爬取方式
</span>
</div>
<!-- 动态参数表单 -->
<div class="form-item" v-if="selectedMethod && selectedMethod.params && selectedMethod.params.length > 0">
<span class="form-label">方法参数</span>
<div class="params-container">
<div v-for="param in selectedMethod.params" :key="param.name" class="param-item">
<span class="param-label">
{{ param.description }}
<span class="param-type">({{ param.type }})</span>
</span>
<el-input
v-if="param.type === 'String'"
v-model="dynamicParams[param.name]"
:placeholder="`请输入${param.description}`"
clearable
/>
<el-input-number
v-else-if="param.type === 'Integer'"
v-model="dynamicParams[param.name]"
:placeholder="`请输入${param.description}`"
controls-position="right"
style="width: 100%"
/>
<el-switch
v-else-if="param.type === 'Boolean'"
v-model="dynamicParams[param.name]"
active-text=""
inactive-text=""
/>
</div>
</div>
</div>
<div class="form-item">
<span class="form-label required">Cron表达式</span>
<el-input
@@ -266,11 +310,11 @@
</template>
<script setup lang="ts">
import { ref, reactive, onMounted } from 'vue';
import { ref, reactive, onMounted, watch } from 'vue';
import { ElMessage, ElMessageBox } from 'element-plus';
import { Plus, Search, Refresh, DocumentCopy, VideoPlay, VideoPause, Promotion, Edit, Delete } from '@element-plus/icons-vue';
import { crontabApi } from '@/apis/crontab';
import type { CrontabTask, PageParam } from '@/types';
import type { CrontabTask, CrontabItem, CrontabMethod, PageParam } from '@/types';
import { AdminLayout } from '@/views/admin';
defineOptions({
name: 'NewsCrawlerView'
@@ -281,6 +325,12 @@ const submitting = ref(false);
const crawlerList = ref<CrontabTask[]>([]);
const total = ref(0);
// 爬虫模板数据
const crawlerTemplates = ref<CrontabItem[]>([]);
const selectedTemplate = ref<CrontabItem | null>(null);
const selectedMethod = ref<CrontabMethod | null>(null);
const dynamicParams = ref<Record<string, any>>({});
// 搜索表单
const searchForm = reactive({
taskName: '',
@@ -300,7 +350,7 @@ const isEdit = ref(false);
// 表单数据
const formData = reactive<Partial<CrontabTask>>({
taskName: '',
taskGroup: 'NEWS_CRAWLER',
taskGroup: '',
beanName: '',
methodName: '',
methodParams: '',
@@ -311,21 +361,65 @@ const formData = reactive<Partial<CrontabTask>>({
description: ''
});
// 监听模板选择变化
watch(selectedTemplate, (newTemplate) => {
if (newTemplate) {
selectedMethod.value = null;
dynamicParams.value = {};
}
});
// 监听方法选择变化
watch(selectedMethod, (newMethod) => {
if (newMethod) {
dynamicParams.value = {};
// 遍历params数组提取默认值
if (newMethod.params && Array.isArray(newMethod.params)) {
newMethod.params.forEach(param => {
dynamicParams.value[param.name] = param.value;
});
}
}
});
// 加载爬虫模板
async function loadCrawlerTemplates() {
try {
const result = await crontabApi.getEnabledCrontabList();
if (result.success && result.dataList) {
crawlerTemplates.value = result.dataList;
} else {
ElMessage.error(result.message || '加载爬虫模板失败');
}
} catch (error) {
console.error('加载爬虫模板失败:', error);
ElMessage.error('加载爬虫模板失败');
}
}
// 加载爬虫列表
async function loadCrawlerList() {
loading.value = true;
try {
const filter: Partial<CrontabTask> = {
taskGroup: 'NEWS_CRAWLER'
taskGroup: ''
};
if (searchForm.taskName) filter.taskName = searchForm.taskName;
if (searchForm.status !== undefined) filter.status = searchForm.status;
const result = await crontabApi.getTaskPage(filter, pageParam);
if (result.success && result.dataList) {
const pageDomain = result.pageDomain!;
crawlerList.value = pageDomain.dataList!;
total.value = pageDomain.pageParam.totalElements!;
if (result.success) {
// 根据后端返回结构处理数据
if (result.pageDomain) {
crawlerList.value = result.pageDomain.dataList || [];
total.value = result.pageDomain.pageParam?.totalElements || 0;
} else if (result.dataList) {
crawlerList.value = result.dataList;
total.value = result.pageParam?.totalElements || 0;
} else {
crawlerList.value = [];
total.value = 0;
}
} else {
ElMessage.error(result.message || '加载爬虫列表失败');
crawlerList.value = [];
@@ -371,6 +465,9 @@ function handleSizeChange(size: number) {
function handleAdd() {
isEdit.value = false;
resetFormData();
selectedTemplate.value = null;
selectedMethod.value = null;
dynamicParams.value = {};
dialogVisible.value = true;
}
@@ -378,6 +475,32 @@ function handleAdd() {
function handleEdit(row: CrontabTask) {
isEdit.value = true;
Object.assign(formData, row);
// 尝试解析methodParams来回填表单
if (row.methodParams) {
try {
const params = JSON.parse(row.methodParams);
// 如果有scriptPath,尝试匹配模板和方法
if (params.scriptPath) {
const template = crawlerTemplates.value.find(t =>
t.methods.some(m => m.path === params.scriptPath)
);
if (template) {
selectedTemplate.value = template;
const method = template.methods.find(m => m.path === params.scriptPath);
if (method) {
selectedMethod.value = method;
// 回填动态参数
const { scriptPath, ...restParams } = params;
dynamicParams.value = restParams;
}
}
}
} catch (error) {
console.warn('解析methodParams失败:', error);
}
}
dialogVisible.value = true;
}
@@ -495,12 +618,8 @@ async function handleSubmit() {
ElMessage.warning('请输入爬虫名称');
return;
}
if (!formData.beanName) {
ElMessage.warning('请输入Bean名称');
return;
}
if (!formData.methodName) {
ElMessage.warning('请输入方法名称');
if (!selectedTemplate.value || !selectedMethod.value) {
ElMessage.warning('请选择爬虫模板和爬取方法');
return;
}
if (!formData.cronExpression) {
@@ -508,14 +627,39 @@ async function handleSubmit() {
return;
}
// 验证必填参数
if (selectedMethod.value.params && Array.isArray(selectedMethod.value.params)) {
for (const param of selectedMethod.value.params) {
const value = dynamicParams.value[param.name];
if (param.type === 'String' && (!value || value.trim() === '')) {
ElMessage.warning(`请输入${param.description}`);
return;
}
if (param.type === 'Integer' && (value === undefined || value === null || value === '')) {
ElMessage.warning(`请输入${param.description}`);
return;
}
}
}
submitting.value = true;
try {
const data = {
...formData,
taskGroup: 'NEWS_CRAWLER'
taskGroup: selectedTemplate.value.name, // 第一层name作为taskGroup
methodName: selectedMethod.value.name, // 第二层name作为methodName
methodParams: JSON.stringify({
scriptPath: selectedMethod.value.path,
...dynamicParams.value
})
};
let result;
console.log('📤 准备提交的数据:', data);
console.log('📤 taskGroup (模板名称):', data.taskGroup);
console.log('📤 methodName (方法名称):', data.methodName);
let result;
if (isEdit.value) {
result = await crontabApi.updateTask(data as CrontabTask);
} else {
@@ -546,7 +690,7 @@ function resetForm() {
function resetFormData() {
Object.assign(formData, {
taskName: '',
taskGroup: 'NEWS_CRAWLER',
taskGroup: '',
beanName: '',
methodName: '',
methodParams: '',
@@ -561,6 +705,7 @@ function resetFormData() {
// 初始化
onMounted(() => {
loadCrawlerList();
loadCrawlerTemplates();
});
</script>
@@ -569,7 +714,8 @@ onMounted(() => {
padding: 20px;
background-color: #fff;
border-radius: 4px;
max-height: 50%;
overflow: auto;
.header {
display: flex;
justify-content: space-between;
@@ -696,6 +842,35 @@ onMounted(() => {
color: #909399;
line-height: 1.6;
}
.params-container {
padding: 12px;
background-color: #f8f9fa;
border-radius: 4px;
border: 1px solid #e4e7ed;
.param-item {
margin-bottom: 16px;
&:last-child {
margin-bottom: 0;
}
.param-label {
display: block;
margin-bottom: 8px;
font-size: 13px;
color: #606266;
font-weight: 500;
.param-type {
color: #909399;
font-weight: normal;
font-size: 12px;
}
}
}
}
}
}
}

View File

@@ -280,7 +280,7 @@ const isEdit = ref(false);
// 表单数据
const formData = reactive<Partial<CrontabTask>>({
taskName: '',
taskGroup: 'DEFAULT',
taskGroup: '',
beanName: '',
methodName: '',
methodParams: '',
@@ -301,9 +301,18 @@ const loadTaskList = async () => {
if (searchForm.status !== undefined) filter.status = searchForm.status;
const result = await crontabApi.getTaskPage(filter, pageParam);
if (result.success && result.dataList) {
if (result.success) {
// 根据后端返回结构处理数据
if (result.pageDomain) {
taskList.value = result.pageDomain.dataList || [];
total.value = result.pageDomain.pageParam?.totalElements || 0;
} else if (result.dataList) {
taskList.value = result.dataList;
total.value = result.pageParam?.totalElements || 0;
} else {
taskList.value = [];
total.value = 0;
}
} else {
ElMessage.error(result.message || '加载任务列表失败');
taskList.value = [];
@@ -526,7 +535,7 @@ function resetForm() {
function resetFormData() {
Object.assign(formData, {
taskName: '',
taskGroup: 'DEFAULT',
taskGroup: '',
beanName: '',
methodName: '',
methodParams: '',

View File

@@ -4,17 +4,651 @@
subtitle="管理文章、资源、数据等内容"
>
<div class="resource-management">
<el-empty description="请使用顶部标签页切换到对应的资源管理功能" />
<div class="header">
<h2>数据采集管理</h2>
<div class="header-actions">
<el-button type="primary" @click="handleRefresh">
<el-icon><Refresh /></el-icon>
刷新
</el-button>
</div>
</div>
<!-- 搜索筛选区域 -->
<div class="search-bar">
<!-- 任务名称搜索 -->
<div class="search-item">
<span class="search-label">任务名称</span>
<el-input
v-model="searchForm.taskName"
placeholder="请输入任务名称"
clearable
style="width: 200px"
@keyup.enter="handleSearch"
/>
</div>
<!-- 日志批次ID搜索 -->
<div class="search-item">
<span class="search-label">批次ID</span>
<el-input
v-model="searchForm.logId"
placeholder="请输入批次ID"
clearable
style="width: 150px"
@keyup.enter="handleSearch"
/>
</div>
<!-- 标题搜索 -->
<div class="search-item">
<span class="search-label">标题</span>
<el-input
v-model="searchForm.title"
placeholder="请输入标题"
clearable
style="width: 200px"
@keyup.enter="handleSearch"
/>
</div>
<!-- 来源URL搜索 -->
<div class="search-item">
<span class="search-label">来源URL</span>
<el-input
v-model="searchForm.sourceUrl"
placeholder="请输入URL"
clearable
style="width: 200px"
@keyup.enter="handleSearch"
/>
</div>
<!-- 状态筛选 -->
<div class="search-item">
<span class="search-label">转换状态</span>
<el-select
v-model="searchForm.status"
placeholder="请选择状态"
clearable
style="width: 120px"
>
<el-option label="未处理" :value="0" />
<el-option label="已转换" :value="1" />
<el-option label="已忽略" :value="2" />
</el-select>
</div>
<!-- 搜索/重置按钮 -->
<div class="search-actions">
<el-button type="primary" @click="handleSearch">
<el-icon><Search /></el-icon>
搜索
</el-button>
<el-button @click="handleReset">
<el-icon><Refresh /></el-icon>
重置
</el-button>
</div>
</div>
<!-- 数据表格 -->
<el-table
:data="dataList"
v-loading="loading"
border
stripe
style="width: 100%"
>
<!-- 任务名称 -->
<el-table-column
prop="taskName"
label="任务名称"
width="150"
fixed="left"
show-overflow-tooltip
/>
<!-- 日志批次ID -->
<el-table-column
prop="logId"
label="批次ID"
width="100"
show-overflow-tooltip
/>
<!-- 来源URL -->
<el-table-column label="来源URL" width="200">
<template #default="{ row }">
<el-link
v-if="row.sourceUrl"
:href="row.sourceUrl"
target="_blank"
type="primary"
:underline="false"
>
{{ truncateUrl(row.sourceUrl) }}
</el-link>
<span v-else>-</span>
</template>
</el-table-column>
<!-- 爬虫解析结果 -->
<el-table-column label="解析结果" width="220">
<template #default="{ row }">
<div class="parse-result">
<div v-if="row.category" class="result-item">
<el-tag size="small" type="info">{{ row.category }}</el-tag>
</div>
<div v-if="row.source" class="result-item">
来源: {{ row.source }}
</div>
<div v-if="row.tags" class="result-item">
标签: {{ row.tags }}
</div>
</div>
</template>
</el-table-column>
<!-- 标题 -->
<el-table-column
prop="title"
label="标题"
min-width="250"
show-overflow-tooltip
/>
<!-- 作者 -->
<el-table-column
prop="author"
label="作者"
width="100"
show-overflow-tooltip
/>
<!-- 发布时间 -->
<el-table-column label="发布时间" width="160">
<template #default="{ row }">
{{ formatDateTime(row.publishTime) }}
</template>
</el-table-column>
<!-- 转换状态 -->
<el-table-column label="转换状态" width="100">
<template #default="{ row }">
<el-tag
:type="getStatusTagType(row.status)"
size="small"
>
{{ getStatusText(row.status) }}
</el-tag>
</template>
</el-table-column>
<!-- 操作列 -->
<el-table-column label="操作" width="260" fixed="right">
<template #default="{ row }">
<el-button
type="primary"
size="small"
@click="handleViewDetail(row)"
>
查看详情
</el-button>
<el-button
v-if="row.status === 0"
type="success"
size="small"
@click="handleConvert(row)"
>
转换为资源
</el-button>
</template>
</el-table-column>
</el-table>
<!-- 分页组件 -->
<div class="pagination-container" v-if="total > 0">
<el-pagination
v-model:current-page="pageParam.pageNumber"
v-model:page-size="pageParam.pageSize"
:page-sizes="[10, 20, 50, 100]"
:total="total"
layout="total, sizes, prev, pager, next, jumper"
@size-change="handleSizeChange"
@current-change="handlePageChange"
/>
</div>
<!-- 详情对话框 -->
<el-dialog
v-model="detailDialogVisible"
title="数据采集详情"
width="900px"
:close-on-click-modal="false"
>
<div class="detail-content" v-if="currentItem">
<!-- 基本信息区域 -->
<el-descriptions title="基本信息" :column="2" border>
<el-descriptions-item label="标题" :span="2">
{{ currentItem.title }}
</el-descriptions-item>
<el-descriptions-item label="作者">
{{ currentItem.author || '未知' }}
</el-descriptions-item>
<el-descriptions-item label="发布时间">
{{ formatDateTime(currentItem.publishTime) }}
</el-descriptions-item>
<el-descriptions-item label="来源">
{{ currentItem.source || '-' }}
</el-descriptions-item>
<el-descriptions-item label="分类">
{{ currentItem.category || '-' }}
</el-descriptions-item>
<el-descriptions-item label="状态">
<el-tag :type="getStatusTagType(currentItem.status)">
{{ getStatusText(currentItem.status) }}
</el-tag>
</el-descriptions-item>
<el-descriptions-item label="任务名称">
{{ currentItem.taskName || '-' }}
</el-descriptions-item>
<el-descriptions-item label="来源URL" :span="2">
<el-link
v-if="currentItem.sourceUrl"
:href="currentItem.sourceUrl"
target="_blank"
type="primary"
>
{{ currentItem.sourceUrl }}
</el-link>
<span v-else>-</span>
</el-descriptions-item>
<el-descriptions-item label="标签" :span="2">
{{ currentItem.tags || '无' }}
</el-descriptions-item>
</el-descriptions>
<!-- 封面图片 -->
<div v-if="currentItem.coverImage" class="cover-section">
<h4>封面图片</h4>
<el-image
:src="currentItem.coverImage"
fit="cover"
style="width: 200px; height: 150px; border-radius: 4px"
:preview-src-list="[currentItem.coverImage]"
/>
</div>
<!-- 摘要 -->
<div v-if="currentItem.summary" class="summary-section">
<h4>摘要</h4>
<p>{{ currentItem.summary }}</p>
</div>
<!-- 正文内容 - 使用富文本显示 -->
<div v-if="currentItem.content" class="content-section">
<h4>正文内容</h4>
<div class="content-display" v-html="currentItem.content"></div>
</div>
<!-- 转换信息 -->
<div v-if="currentItem.status === 1" class="convert-info">
<h4>转换信息</h4>
<el-descriptions :column="2" border>
<el-descriptions-item label="资源ID">
{{ currentItem.resourceId || '-' }}
</el-descriptions-item>
<el-descriptions-item label="转换时间">
{{ formatDateTime(currentItem.processTime) }}
</el-descriptions-item>
<el-descriptions-item label="处理人" :span="2">
{{ currentItem.processor || '系统' }}
</el-descriptions-item>
</el-descriptions>
</div>
<!-- 错误信息 -->
<div v-if="currentItem.status === 2 && currentItem.errorMessage" class="error-info">
<h4>错误信息</h4>
<el-alert type="error" :closable="false">
{{ currentItem.errorMessage }}
</el-alert>
</div>
</div>
<template #footer>
<el-button @click="detailDialogVisible = false">关闭</el-button>
<el-button
v-if="currentItem && currentItem.status === 0"
type="success"
@click="handleConvertFromDetail"
>
转换为资源
</el-button>
</template>
</el-dialog>
<!-- 转换对话框 - 使用 ArticleAdd 组件 -->
<el-dialog
v-model="convertDialogVisible"
title="转换为资源"
width="90%"
:close-on-click-modal="false"
:destroy-on-close="true"
top="5vh"
>
<ArticleAdd
v-if="convertDialogVisible"
:initial-data="convertFormData"
:show-back-button="false"
@publish-success="handleConvertSuccess"
@back="convertDialogVisible = false"
/>
</el-dialog>
</div>
</AdminLayout>
</template>
<script setup lang="ts">
import { ref, reactive, onMounted } from 'vue';
import { ElMessage, ElMessageBox } from 'element-plus';
import { Search, Refresh } from '@element-plus/icons-vue';
import { crontabApi } from '@/apis/crontab';
import { ArticleAdd } from '@/views/public/article/components';
import type { DataCollectionItem, PageParam, ResourceVO } from '@/types';
import { AdminLayout } from '@/views/admin';
defineOptions({
name: 'ResourceManagementView'
});
// ==================== 数据状态 ====================
const loading = ref(false);
const dataList = ref<DataCollectionItem[]>([]);
const total = ref(0);
const currentItem = ref<DataCollectionItem | null>(null);
const convertItem = ref<DataCollectionItem | null>(null);
// 转换表单数据
const convertFormData = ref<ResourceVO>({
resource: {},
tags: []
});
// ==================== 搜索表单 ====================
const searchForm = reactive({
taskName: '',
logId: '',
title: '',
sourceUrl: '',
status: undefined as number | undefined
});
// ==================== 分页参数 ====================
const pageParam = reactive<PageParam>({
pageNumber: 1,
pageSize: 20
});
// ==================== 对话框状态 ====================
const detailDialogVisible = ref(false);
const convertDialogVisible = ref(false);
// ==================== 数据加载 ====================
/**
* 加载数据采集列表
*/
async function loadDataList() {
loading.value = true;
try {
const filter: Partial<DataCollectionItem> = {};
if (searchForm.taskName) filter.taskName = searchForm.taskName;
if (searchForm.logId) filter.logId = searchForm.logId;
if (searchForm.title) filter.title = searchForm.title;
if (searchForm.sourceUrl) filter.sourceUrl = searchForm.sourceUrl;
if (searchForm.status !== undefined) filter.status = searchForm.status;
const result = await crontabApi.getCollectionItemPage(filter, pageParam);
if (result.success) {
if (result.pageDomain) {
dataList.value = result.pageDomain.dataList || [];
total.value = result.pageDomain.pageParam?.totalElements || 0;
} else if (result.dataList) {
dataList.value = result.dataList;
total.value = result.pageParam?.totalElements || 0;
} else {
dataList.value = [];
total.value = 0;
}
} else {
ElMessage.error(result.message || '加载数据失败');
dataList.value = [];
total.value = 0;
}
} catch (error) {
console.error('加载数据采集列表失败:', error);
ElMessage.error('加载数据失败');
dataList.value = [];
total.value = 0;
} finally {
loading.value = false;
}
}
// ==================== 搜索操作 ====================
/**
* 搜索
*/
function handleSearch() {
pageParam.pageNumber = 1;
loadDataList();
}
/**
* 重置搜索
*/
function handleReset() {
searchForm.taskName = '';
searchForm.logId = '';
searchForm.title = '';
searchForm.sourceUrl = '';
searchForm.status = undefined;
pageParam.pageNumber = 1;
loadDataList();
}
/**
* 刷新列表
*/
function handleRefresh() {
loadDataList();
}
// ==================== 分页操作 ====================
/**
* 页码变化
*/
function handlePageChange(page: number) {
pageParam.pageNumber = page;
loadDataList();
}
/**
* 每页数量变化
*/
function handleSizeChange(size: number) {
pageParam.pageSize = size;
pageParam.pageNumber = 1;
loadDataList();
}
// ==================== 详情查看 ====================
/**
* 查看详情
*/
function handleViewDetail(row: DataCollectionItem) {
currentItem.value = row;
detailDialogVisible.value = true;
}
// ==================== 转换操作 ====================
/**
* 处理富文本内容,清理不必要的样式
*/
function cleanHtmlContent(html: string): string {
if (!html) return '';
// 创建临时DOM元素来处理HTML
const tempDiv = document.createElement('div');
tempDiv.innerHTML = html;
// 移除所有内联样式中的字体大小、字体族等可能导致显示问题的样式
const elementsWithStyle = tempDiv.querySelectorAll('[style]');
elementsWithStyle.forEach((el) => {
const element = el as HTMLElement;
const style = element.style;
// 保留一些重要的样式,移除可能冲突的样式
const preservedStyles: string[] = [];
// 保留文本颜色
if (style.color) preservedStyles.push(`color: ${style.color}`);
// 保留背景色
if (style.backgroundColor) preservedStyles.push(`background-color: ${style.backgroundColor}`);
// 保留文本对齐
if (style.textAlign) preservedStyles.push(`text-align: ${style.textAlign}`);
// 保留边距
if (style.marginTop) preservedStyles.push(`margin-top: ${style.marginTop}`);
if (style.marginBottom) preservedStyles.push(`margin-bottom: ${style.marginBottom}`);
element.setAttribute('style', preservedStyles.join('; '));
});
// 移除可能的外部类名,避免样式冲突
const elementsWithClass = tempDiv.querySelectorAll('[class]');
elementsWithClass.forEach((el) => {
el.removeAttribute('class');
});
return tempDiv.innerHTML;
}
/**
* 打开转换对话框,预填充数据
*/
function handleConvert(row: DataCollectionItem) {
convertItem.value = row;
// 处理富文本内容,清理样式
const cleanedContent = cleanHtmlContent(row.content || '');
// 预填充文章数据
convertFormData.value = {
resource: {
title: row.title || '',
content: cleanedContent,
summary: row.summary || '',
coverImage: row.coverImage || '',
author: row.author || '',
source: row.source || '',
sourceUrl: row.sourceUrl || '',
publishTime: row.publishTime || new Date().toISOString(),
status: 1, // 已发布
allowComment: true,
isTop: false,
isRecommend: false
},
tags: []
};
convertDialogVisible.value = true;
}
/**
* 从详情页转换
*/
function handleConvertFromDetail() {
detailDialogVisible.value = false;
handleConvert(currentItem.value!);
}
/**
* 转换成功后的回调
*/
function handleConvertSuccess(resourceId: string) {
ElMessage.success('转换成功');
convertDialogVisible.value = false;
// 更新采集项状态为已转换
if (convertItem.value?.id) {
// 这里可以调用API更新状态,或者直接刷新列表
loadDataList();
}
}
// ==================== 辅助函数 ====================
/**
* 格式化日期时间
*/
function formatDateTime(dateTime: string | Date | undefined): string {
if (!dateTime) return '-';
const date = typeof dateTime === 'string' ? new Date(dateTime) : dateTime;
if (isNaN(date.getTime())) return '-';
return date.toLocaleString('zh-CN', {
year: 'numeric',
month: '2-digit',
day: '2-digit',
hour: '2-digit',
minute: '2-digit'
});
}
/**
* 截断URL显示
*/
function truncateUrl(url: string | undefined): string {
if (!url) return '-';
return url.length > 30 ? url.substring(0, 30) + '...' : url;
}
/**
* 获取状态文本
*/
function getStatusText(status: number | undefined): string {
switch (status) {
case 0: return '未处理';
case 1: return '已转换';
case 2: return '已忽略';
default: return '未知';
}
}
/**
* 获取状态标签类型
*/
function getStatusTagType(status: number | undefined): string {
switch (status) {
case 0: return 'warning';
case 1: return 'success';
case 2: return 'info';
default: return '';
}
}
// ==================== 生命周期 ====================
onMounted(() => {
loadDataList();
});
</script>
<style lang="scss" scoped>
@@ -23,8 +657,184 @@ defineOptions({
padding: 24px;
border-radius: 14px;
min-height: 400px;
.header {
display: flex;
justify-content: space-between;
align-items: center;
margin-bottom: 20px;
h2 {
margin: 0;
font-size: 24px;
font-weight: 600;
color: #141F38;
}
.header-actions {
display: flex;
gap: 12px;
}
}
.search-bar {
display: flex;
align-items: center;
justify-content: center;
gap: 16px;
margin-bottom: 20px;
padding: 20px;
background-color: #f8f9fa;
border-radius: 8px;
flex-wrap: wrap;
.search-item {
display: flex;
align-items: center;
gap: 8px;
.search-label {
font-size: 14px;
color: #606266;
white-space: nowrap;
min-width: 70px;
}
}
.search-actions {
display: flex;
gap: 8px;
margin-left: auto;
}
}
// 表格内的解析结果
.parse-result {
.result-item {
margin-bottom: 4px;
font-size: 12px;
color: #606266;
&:last-child {
margin-bottom: 0;
}
}
}
// 分页容器
.pagination-container {
margin-top: 20px;
display: flex;
justify-content: flex-end;
}
// 详情对话框样式
.detail-content {
max-height: 70vh;
// overflow-y: auto;
h4 {
margin: 20px 0 10px 0;
font-size: 16px;
font-weight: 600;
color: #303133;
border-left: 4px solid #409eff;
padding-left: 10px;
&:first-child {
margin-top: 0;
}
}
.cover-section {
margin-top: 20px;
}
.summary-section {
margin-top: 20px;
p {
padding: 12px;
background-color: #f5f7fa;
border-radius: 4px;
line-height: 1.8;
color: #606266;
margin: 0;
}
}
.content-section {
margin-top: 20px;
.content-display {
padding: 16px;
background-color: #ffffff;
border: 1px solid #e4e7ed;
border-radius: 4px;
line-height: 1.8;
color: #303133;
max-height: 200px;
overflow-y: auto;
// 富文本内容样式
:deep(img) {
max-width: 100%;
height: auto;
}
:deep(p) {
margin: 8px 0;
}
:deep(h1), :deep(h2), :deep(h3),
:deep(h4), :deep(h5), :deep(h6) {
margin: 16px 0 8px 0;
}
:deep(a) {
color: #409eff;
text-decoration: none;
&:hover {
text-decoration: underline;
}
}
:deep(ul), :deep(ol) {
padding-left: 24px;
}
:deep(blockquote) {
border-left: 4px solid #dcdfe6;
padding-left: 12px;
color: #909399;
margin: 12px 0;
}
:deep(code) {
background-color: #f5f7fa;
padding: 2px 6px;
border-radius: 3px;
font-family: 'Courier New', monospace;
}
:deep(pre) {
background-color: #f5f7fa;
padding: 12px;
border-radius: 4px;
overflow-x: auto;
code {
background-color: transparent;
padding: 0;
}
}
}
}
.convert-info,
.error-info {
margin-top: 20px;
}
}
}
</style>

View File

@@ -125,6 +125,7 @@ interface Props {
articleId?: string;
showBackButton?: boolean;
backButtonText?: string;
initialData?: ResourceVO;
}
const props = withDefaults(defineProps<Props>(), {
@@ -195,7 +196,7 @@ async function loadCategoryList() {
async function loadTagList() {
try {
tagLoading.value = true;
const result = await resourceTagApi.getTagList();
const result = await resourceTagApi.getTagList({});
if (result.success) {
tagList.value = result.dataList || [];
} else {
@@ -220,7 +221,15 @@ async function handlePublish() {
await formRef.value?.validate();
publishing.value = true;
if (isEdit.value) {
const result = await resourceApi.updateResource(articleForm.value);
if (result.success) {
ElMessage.success('保存成功');
emit('publish-success', result.data?.resource?.resourceID || '');
} else {
ElMessage.error(result.message || '保存失败');
}
} else {
const result = await resourceApi.createResource(articleForm.value);
if (result.success) {
ElMessage.success('发布成功');
@@ -228,6 +237,7 @@ async function handlePublish() {
} else {
ElMessage.error(result.message || '发布失败');
}
}
} catch (error) {
console.error('发布失败:', error);
} finally {
@@ -283,7 +293,16 @@ onMounted(async () => {
loadTagList()
]);
// 如果是编辑模式,加载文章数据
// 如果有初始数据,使用初始数据填充表单
if (props.initialData) {
articleForm.value = {
resource: { ...props.initialData.resource },
tags: [...(props.initialData.tags || [])]
};
return;
}
// 如果是编辑模式,加载文章数据
if (props.articleId) {
try {
isEdit.value = true;