使用Python编写的中国新闻资讯爬虫

发布时间：2024-01-05 栏目：建站知识浏览：分类：python教程 python爬虫

要爬取chinanews新闻资讯，可以使用Python的requests库和BeautifulSoup库。首先，安装这两个库：`pip install requests` 和 `pip install beautifulsoup4`。然后，编写爬虫代码，如下所示： ```python import requests from bs4 import BeautifulSoup url = 'https://www.chinanews.com/' # 替换为你想要爬取的新闻网址 response = requests.get(url) soup = BeautifulSoup(response.text, 'html.parser') # 在这里添加你想要提取的新闻信息，例如标题、发布时间等 # 示例：提取所有文章标题 titles = soup.find_all('h3', class_='title') for title in titles: print(title.text) ``` 注意：根据实际网站结构，可能需要修改代码中的标签名和类名

importrequestsfrombs4importBeautifulSoupfromopenpyxlimportWorkbookfromdatetimeimportdatetime#-----参考文档，三件套-------#https://docs.python-requests.org/en/latest/#https://www.crummy.com/software/BeautifulSoup/bs4/doc/#https://openpyxl.readthedocs.io/en/stable/#https://docs.python.org/3/library/stdtypes.html#str.strip（切片）#-----参考文档，三件套-------#5.1新增时间格式规范化输出文件名#获取当前时间now=datetime.now()#将时间格式化为指定的字符串格式formatted_time=now.strftime('%Y-%-m-%-d')#创建一个Workbook对象，用于Excel的读写wb=Workbook()#添加一个Sheet页，并且指定Sheet名称sheet=wb.activesheet.title='Sheet1'#定义变量row，用于循环时控制每一行的写入位置row=1#添加表头sheet['A1']='栏目'sheet['B1']='标题'sheet['C1']='时间'#遍历页码1从2页forpage_numinrange(1,3):#f-stringurl=f"https://www.chinanews.com.cn/scroll-news/news{page_num}.html"#反爬通用套码headers={'User-Agent':'Mozilla/5.0(WindowsNT10.0;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/58.0.3029.110Safari/537.3'}r=requests.get(url,headers=headers)r.encoding='utf-8'soup=BeautifulSoup(r.text,'html.parser')#遍历栏目、标题和时间dangdu_lanmu=soup.find_all('div',class_='dd_lm')dangdu_biaoti=soup.find_all('div',class_='dd_bt')dangdu_time=soup.find_all('div',class_='dd_time')#追加具体数据fornews_numinrange(len(dangdu_lanmu)):sheet.append([dangdu_lanmu[news_num].text.strip('[]'),dangdu_biaoti[news_num].text,dangdu_time[news_num].text])#row=row+1row+=1#保存Excel文件wb.save("chinanews_{}.xlsx".format(formatted_time))

分享笔趣阁小说网Python爬虫技术

使用Python爬虫抓取中国天气数据并发送至微

使用Python爬虫下载抖音用户所有短视频的无

使用Python编写的中国新闻资讯爬虫

相关文章

推荐教程

最新文章