Всем привет
Нужна ваша помощь, не получается спарсить пагинацию ссылка в HEADERS -
https://www.strava.com/segments/9926585/leaderboar...
в HTML такая -
https://www.strava.com/segments/9926585/leaderboar...
код headers
Request URL: https://www.strava.com/segments/9926585/leaderboard?club_id=225082&page=1&per_page=25&partial=true
Request Method: GET
Status Code: 200
Remote Address: 192.168.7.10:3128
Referrer Policy: no-referrer-when-downgrade
cache-control: no-cache, no-store
content-encoding: gzip
content-type: text/html; charset=utf-8
date: Wed, 27 May 2020 13:14:20 GMT
etag: W/"1f29ed5179163df3e47f122e9f646fd2"
expires: Sat, 01 Jan 2000 00:00:00 GMT
pragma: no-cache
referrer-policy: strict-origin-when-cross-origin
status: 200
status: 200 OK
via: 1.1 linkerd
x-content-type-options: nosniff
x-download-options: noopen
x-frame-options: DENY
x-permitted-cross-domain-policies: none
x-request-id: d3834ff4-23b8-4fdb-8073-aef8ad68bd6b
x-xss-protection: 1; mode=block
:authority: www.strava.com
:method: GET
:path: /segments/9926585/leaderboard?club_id=225082&page=1&per_page=25&partial=true
:scheme: https
accept: text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9
accept-encoding: gzip, deflate, br
accept-language: ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7
cache-control: max-age=0
cookie: ?????
sec-fetch-dest: document
sec-fetch-mode: navigate
sec-fetch-site: none
sec-fetch-user: ?1
upgrade-insecure-requests: 1
user-agent: Mozilla/5.0 (Windows NT 6.1; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/81.0.4044.122 Safari/537.36
club_id: 225082
page: 1
per_page: 25
partial: true
HTML код в разрабочике хроме
<nav>
<ul class="pagination" data-filter="overall">
<li class="previous_page disabled"><span>←</span></li>
<li class="active"><span>1</span></li>
<li><a rel="next" href="/segments/9926585/leaderboard?club_id=225082&filter=overall&page=2&per_page=25">2</a></li>
<li><a href="/segments/9926585/leaderboard?club_id=225082&filter=overall&page=3&per_page=25">3</a></li>
<li><a href="/segments/9926585/leaderboard?club_id=225082&filter=overall&page=4&per_page=25">4</a></li>
<li><a href="/segments/9926585/leaderboard?club_id=225082&filter=overall&page=5&per_page=25">5</a></li>
<li class="next_page"><a rel="next" href="/segments/9926585/leaderboard?club_id=225082&filter=overall&page=2&per_page=25">→</a>
</li>
</ul>
</nav>
# получаем данные с таблицы
def get_table_data(num):
lis = []
global NAMES
url = '{}/leaderboard?club_id=225082&filter=overall&page={}&per_page=25&partial=true'.format(conf.URL_RATING, num)
print(url)
response = session.get(url, headers=headers)
print(response)
soup = BeautifulSoup(response.text, "lxml")
table = soup.find('table', {'class':'table table-striped table-padded table-leaderboard'}).find('tbody').find_all('tr')
print('===================== Все участники с данными из списка ========================')
for rows in table:
col = rows.find_all('td')
if col[1].a.text.replace('\n', '') in NAMES:
try: rating = col[0].text.replace('\n', '')
except: rating = None
try: name = col[1].a.text.replace('\n', '')
except: name = None
try: date = col[2].a.text.replace('\n', '')
except: date = None
try: temp = col[3].text.replace('\n', '')
except: temp = None
try: pulse = col[4].text.replace('\n', '')
except: pulse = None
try: time = col[5].text.replace('\n', '')
except: time = None
dic = {
'Рейтинг': rating,
'Имя': name,
'Дата': date,
'Темп': temp,
'Пульс': pulse,
'Время': time,
}
print(dic)
lis.append(dic)
return lis
def main():
authorization()
num_reting = pages_number(conf.URL_RATING)
for num in range(1, int(num_reting)+1):
data = get_table_data(num)
print(data)
# save_db(data)
выдача в консоли говорит что ошибка 404, вот только почему не понятно
D:\www\starava_com\venv\lib\site-packages\pymysql\cursors.py:170: Warning: (1366, "Incorrect string value: '\\xE7\\xE8\\xEC\\xE0)' for column 'VARIABLE_VALUE' at row 484")
result = self._query(query)
https://www.strava.com/segments/9926585/leaderboard?club_id=225082&filter=overall&page=1&per_page=25&partial=true
<Response [404]>
Traceback (most recent call last):
File "D:/www/starava_com/parser.py", line 149, in <module>
main()
File "D:/www/starava_com/parser.py", line 143, in main
data = get_table_data(num)
File "D:/www/starava_com/parser.py", line 64, in get_table_data
table = soup.find('table', {'class':'table table-striped table-padded table-leaderboard'}).find('tbody').find_all('tr')
AttributeError: 'NoneType' object has no attribute 'find'
Process finished with exit code 1