使用Python进行公众号批量抓取，下载音频和视频

发布时间：2024-01-05 栏目：建站知识浏览：分类：python教程 Python爬取 Python抓取

要使用Python批量抓取公众号并下载音频和视频，可以使用第三方库如`itchat`或要使用Python批量抓取公众号并下载音频和视频，可以使用第三方库如`itchat`或`wxpy`进行登录微信，然后使用`requests`库获取公众号文章的链接。接着，根据链接判断是音频还是视频，分别使用相应的方法下载。对于音频，可以使用`pydub`库提取音频；对于视频，可以使用`moviepy`库提取视频。最后，将下载的音频和视频保存到本地。需要注意的是，这种方法可能会违反微信公众号的使用协议，因此请谨慎使用。

之前发过Python批量抓取公众号的教程，这次不抓取公众号阅读数数据，批量下载公众号文章，音频和视频，直接上代码：

defvideo(res,headers,date):vid=re.search(r'wxv_.{19}',res.text)#time.sleep(2)ifvid:vid=vid.group(0)print('视频id',vid)url=f'https://mp.weixin.qq.com/mp/videoplayer?action=get_mp_video_play_url&preview=0&vid={vid}'data=requests.get(url,headers=headers,timeout=1).json()video_url=data['url_info'][0]['url']video_data=requests.get(video_url,headers=headers)print('正在下载视频：'+trimName(data['title'])+'.mp4')withopen(date+'___'+trimName(data['title'])+'.mp4','wb')asf:f.write(video_data.content)defaudio(res,headers,date,title):aids=re.findall(r'"voice_id":"(.*?)"',res.text)time.sleep(2)tmp=0foridinaids:tmp+=1url=f'https://res.wx.qq.com/voice/getvoice?mediaid={id}'audio_data=requests.get(url,headers=headers)print('正在下载音频：'+title+'.mp3')withopen(date+'___'+trimName(title)+'___'+str(tmp)+'.mp3','wb')asf5:f5.write(audio_data.content)url=input('请输入文章链接：')response=requests.get(url,headers=headers)urls=re.findall('<atarget="_blank"href="(https?://mp.weixin.qq.com/s?.*?)"',response.text)urls.append(url)print('文章总数',len(urls))formp_urlinurls:res=requests.get(html.unescape(mp_url),proxies={'http':None,'https':None},verify=False,headers=headers)content=res.text.replace('data-src','src').replace('//res.wx.qq.com','https://res.wx.qq.com')try:title=re.search(r'varmsg_title='(.*)'',content).group(1)ct=re.search(r'varct="(.*)";',content).group(1)date=time.strftime('%Y-%m-%d',time.localtime(int(ct)))print(date,title)audio(res,headers,date,title)video(res,headers,date)withopen(date+'_'+title+'.html','w',encoding='utf-8')asf:f.write(content)exceptExceptionaserr:withopen(str(randint(1,10))+'.html','w',encoding='utf-8')asf:f.write(content)

下载的音频，视频在当前目录，文章html可以用python再转pdf。

使用Python进行公众号批量抓取，下载音频和视频

相关文章

推荐教程

最新文章