# 爬虫页面

## [人民网搜索](http://search.people.cn/) 热点排行

```python
CrawlerConfig(
    base_url="http://www.people.com.cn",
    urls={
        "search": UrlConfig(
            url="http://search.people.cn/search-platform/front/search",
            method="POST",
            params={
                "key": "",
                "page": 1,
                "limit": 10,
                "hasTitle": True,
                "hasContent": True,
                "isFuzzy": True,
                "type": 0,  # 0 所有，1 新闻，2 互动，3 报刊，4 图片，5 视频
                "sortType": 2,  # 1 按相关度，2 按时间
                "startTime": 0,
                "endTime": 0
            }
        ),
        "hot_point_rank": UrlConfig(
            url="http://search.people.cn/search-platform/front/searchRank",
            method="GET",
            params={}
        )
    },
    headers={
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/142.0.0.0 Safari/537.36',
        'Accept': 'application/json, text/plain, */*',
        'Accept-Language': 'zh-CN,zh;q=0.9',
        'Content-Type': 'application/json;charset=UTF-8'
    }
)
```

## [精彩头条](http://www.people.com.cn/GB/59476/index.html)

> 查询对应日期的所有精彩头条
http://www.people.com.cn/GB/59476/review/yyyyMMdd.html

一个html文件。里面包含了当天所有精彩新闻

## 一个新闻详情页html内数据结构
```sh
---------------------------------------------------------------------------
xxxx等导航栏
---------------------------------------------------------------------------
# 左右结构        col col-1 fr                 |col col-2 fr
新闻标题 h1                                    |   热门排行  rm_ranking cf
新闻作者 author cf                             |
时间、渠道  channel cf （col-1-1 fl）          |
新闻内容（含img、video标签）  rm_txt_con cf     | 二维码   tjewm1 cf
img、video都是相对路径。拼接baseUrl
---------------------------------------------------------------------------
```


## [反腐](http://fanfu.people.com.cn/index1.html)
分页查询  http://fanfu.people.com.cn/index{page}.html
根据页数拼接get等待html。
对里面的独立新闻链接访问。 再走独立详情页查询