【Python爬虫+数据分析】2018年电影，你看了几部？

发布时间：2018-12-26 06:23:23 所属栏目：教程来源：法纳斯特

导读：12月已开始了，离2018年的结束也就半个多月的时间了，还记得年初立下的flag吗? 完成了多少?相信很多人和我一样，抱头痛哭... 本次利用猫眼电影，实现对2018年的电影大数据进行分析。一、网页分析 01 标签通过点击猫眼电影已经归类好的标签，得到网址信息

01 构造请求头

head = """ 
Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8 
Accept-Encoding:gzip, deflate, br 
Accept-Language:zh-CN,zh;q=0.8 
Cache-Control:max-age=0 
Connection:keep-alive 
Host:maoyan.com 
Upgrade-Insecure-Requests:1 
Content-Type:application/x-www-form-urlencoded; charset=UTF-8 
User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36 
""" 
 
def str_to_dict(header): 
    """ 
    构造请求头,可以在不同函数里构造不同的请求头 
    """ 
    header_dict = {} 
    header = header.split('n') 
    for h in header: 
        h = h.strip() 
        if h: 
            k, v = h.split(':', 1) 
            header_dict[k] = v.strip() 
    return header_dict

因为索引页和详情页请求头不一样，这里为了简便，构造了一个函数。

02 获取电影详情页链接

def get_url(): 
    """ 
    获取电影详情页链接 
    """ 
    for i in range(0, 300, 30): 
        time.sleep(10) 
        url = 'http://maoyan.com/films?showType=3&yearId=13&sortId=3&offset=' + str(i) 
        host = """Referer:http://maoyan.com/films?showType=3&yearId=13&sortId=3&offset=0 
        """ 
        header = head + host 
        headers = str_to_dict(header) 
        response = requests.get(url=url, headers=headers) 
        html = response.text 
        soup = BeautifulSoup(html, 'html.parser') 
        data_1 = soup.find_all('div', {'class': 'channel-detail movie-item-title'}) 
        data_2 = soup.find_all('div', {'class': 'channel-detail channel-detail-orange'}) 
        num = 0 
        for item in data_1: 
            num += 1 
            time.sleep(10) 
            url_1 = item.select('a')[0]['href'] 
            if data_2[num-1].get_text() != '暂无评分': 
                url = 'http://maoyan.com' + url_1 
                for message in get_message(url): 
                    print(message) 
                    to_mysql(message) 
                print(url) 
                print('---------------^^^Film_Message^^^-----------------') 
            else: 
                print('The Work Is Done') 
                break

（编辑：东莞站长网）

【声明】本站内容均来自网络，其相关言论仅代表作者个人观点，不代表本站立场。若无意侵犯到您的权利，请及时与联系站长删除相关内容!

3/11

首页

尾页

笔记本电脑卡,教您笔记	amr文件用什么打开,教
桌面图标有蓝底,教您桌	无线网卡驱动怎么安装