使用Python编写自动爬取轻壁纸网站的脚本

发布时间：2024-01-05 栏目：建站知识浏览：分类：python教程 Python爬取 Python壁纸

要使用Python自动爬取轻壁纸网站，可以使用requests库获取网页内容，BeautifulSoup库解析网页，提取图片链接。首先，安装所需库：`pip install requests beautifulsoup4`。然后，编写脚本：1. 导入库；2. 发送请求，获取网页内容；3. 使用BeautifulSoup解析网页，提取图片链接；4. 下载图片并保存到本地。注意遵守网站的爬虫政策，合理设置爬取速度和间隔。

吾爱大佬分享的一个免费壁纸网站-轻壁纸，都是高清免费壁纸，附加一段Python自动爬取轻壁纸的脚本，0分钟换一张壁纸，都是4K 2K的壁纸。

Python爬取脚本

importrequestsfrompathlibimportPathfromlxmlimportetreefromrichimportprintfromloguruimportloggerfromrequests.adaptersimportHTTPAdapterlogpath=Path(__file__).parent.joinpath('img.log')logger.add(str(logpath))defget_res(url):"""获取网页内容"""headers={"user-agent":"Mozilla/5.0(WindowsNT10.0;Win64;x64)AppleWebKit/537.36(KHTML,likeGecko)Chrome/95.0.4638.69Safari/537.36"}r=requests.Session()r.mount('https://',HTTPAdapter(max_retries=5))res=r.get(url,headers=headers,timeout=30)returnresdefparse_src(res):"""分析src，获取图片下载链接"""try:et=etree.HTML(res.text)masonry=et.xpath("//div[@class='masonry']")[-1]src=masonry.xpath("//article//a[@class='entry-thumbnail']/img/@data-src")img_url_list=[]forsinsrc:img_url_list.append("-".join(s.split('x')[0].split('-')[:-1])+"."+s.split('x')[1].split('.')[-1])returnimg_url_listexceptExceptionase:logger.error(f"此页{res.url}访问失败,请重试!")defdownload_img(img_url_list):"""下载图片"""iftype(img_url_list)islist:path=Path(__file__).parent.joinpath('images')path.mkdir(parents=True,exist_ok=True)file_name=[imgurl.split('/')[-1].replace("?","")forimgurlinimg_url_list]fori,imgurlinenumerate(img_url_list):ifpath.joinpath(file_name[i]).exists():img_url_list.remove(imgurl)print(f"文件{file_name[i]}已下载不能重复下载")iflen(img_url_list)>0:ress=map(get_res,img_url_list)fori,resinenumerate(ress):ifres:withopen(str(path.joinpath(file_name[i])),'wb')asf:f.write(res.content)print(f'已经成功下载{file_name[i]},保存在{str(path)}')defmain(startnum=1,endnum=20):'''逻辑主函数'''url=lambdanum:f"https://bz.qinggongju.com/page/{num}/"urls=[url(i)foriinrange(startnum,endnum+1)]list(map(download_img,[image_url_listforimage_url_listinmap(parse_src,[resforresinmap(get_res,urls)])]))if__name__=="__main__":startnum=input('共20页热门图片，请输入开始页面数字：')endnum=input('请输入结束页面数字，不能超过20:')ifint(startnum)>=1andint(endnum)<=20:main(int(startnum),int(endnum))else:print('[red]Error:请重新启动程序输入数字！')

无相关信息

使用Python编写自动爬取轻壁纸网站的脚本

Python爬取脚本

相关文章

推荐教程

最新文章