01 构造请求头
- head = """
- Accept:text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8
- Accept-Encoding:gzip, deflate, br
- Accept-Language:zh-CN,zh;q=0.8
- Cache-Control:max-age=0
- Connection:keep-alive
- Host:maoyan.com
- Upgrade-Insecure-Requests:1
- Content-Type:application/x-www-form-urlencoded; charset=UTF-8
- User-Agent:Mozilla/5.0 (Windows NT 10.0; WOW64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/59.0.3071.86 Safari/537.36
- """
-
- def str_to_dict(header):
- """
- 构造请求头,可以在不同函数里构造不同的请求头
- """
- header_dict = {}
- header = header.split('n')
- for h in header:
- h = h.strip()
- if h:
- k, v = h.split(':', 1)
- header_dict[k] = v.strip()
- return header_dict
因为索引页和详情页请求头不一样,这里为了简便,构造了一个函数。
02 获取电影详情页链接
- def get_url():
- """
- 获取电影详情页链接
- """
- for i in range(0, 300, 30):
- time.sleep(10)
- url = 'http://maoyan.com/films?showType=3&yearId=13&sortId=3&offset=' + str(i)
- host = """Referer:http://maoyan.com/films?showType=3&yearId=13&sortId=3&offset=0
- """
- header = head + host
- headers = str_to_dict(header)
- response = requests.get(url=url, headers=headers)
- html = response.text
- soup = BeautifulSoup(html, 'html.parser')
- data_1 = soup.find_all('div', {'class': 'channel-detail movie-item-title'})
- data_2 = soup.find_all('div', {'class': 'channel-detail channel-detail-orange'})
- num = 0
- for item in data_1:
- num += 1
- time.sleep(10)
- url_1 = item.select('a')[0]['href']
- if data_2[num-1].get_text() != '暂无评分':
- url = 'http://maoyan.com' + url_1
- for message in get_message(url):
- print(message)
- to_mysql(message)
- print(url)
- print('---------------^^^Film_Message^^^-----------------')
- else:
- print('The Work Is Done')
- break
(编辑:东莞站长网)
【声明】本站内容均来自网络,其相关言论仅代表作者个人观点,不代表本站立场。若无意侵犯到您的权利,请及时与联系站长删除相关内容!
|