Все добрый вечер
Делаю парсер и встала задача перехода по страницам сайта. При клике по номеру странице в пагинации страница перезагружается, но когда я делаю переход по ней через requests.post(url) то на выходе получаю 404. Что я делаю не так подскажите, или дайте ссылку где еще можно про это прочесть.
Заранее спасибо.
В DevTools хрома выдает два запроса.
Первый запрос с именем t
Request URL: https://api.segment.io/v1/t
Request Method: POST
Status Code: 200
Remote Address: 54.68.229.68:443
Referrer Policy: strict-origin-when-cross-origin
access-control-allow-origin: https://www.strava.com
content-length: 21
content-type: application/json
date: Wed, 27 May 2020 17:03:58 GMT
status: 200
vary: Origin
:authority: api.segment.io
:method: POST
:path: /v1/t
:scheme: https
accept: */*
accept-encoding: gzip, deflate, br
accept-language: ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7
content-length: 1075
content-type: text/plain
origin: https://www.strava.com
referer: https://www.strava.com/
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: cross-site
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36
Второй запрос сама ссылка без стилей
https://www.strava.com/segments/9926585/leaderboar...
Request URL: https://www.strava.com/segments/9926585/leaderboard?club_id=225082&filter=overall&page=2&per_page=25&partial=true
Request Method: GET
Status Code: 200
Remote Address: 54.236.197.110:443
Referrer Policy: strict-origin-when-cross-origin
cache-control: no-cache, no-store
content-encoding: gzip
content-type: text/html; charset=utf-8
date: Wed, 27 May 2020 17:03:58 GMT
etag: W/"09fe30db1a037e3a42befc61059332c5"
expires: Sat, 01 Jan 2000 00:00:00 GMT
pragma: no-cache
referrer-policy: strict-origin-when-cross-origin
status: 200
status: 200 OK
via: 1.1 linkerd
x-content-type-options: nosniff
x-download-options: noopen
x-frame-options: DENY
x-permitted-cross-domain-policies: none
x-request-id: b50ece3e-8016-42d3-80d6-53373ca2b404
x-xss-protection: 1; mode=block
:authority: www.strava.com
:method: GET
:path: /segments/9926585/leaderboard?club_id=225082&filter=overall&page=2&per_page=25&partial=true
:scheme: https
accept: text/html, */*; q=0.01
accept-encoding: gzip, deflate, br
accept-language: ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7
cookie: _sp_ses.047d=*; sp=8d0d9e32-dd2c-4ee1-a619-9ffc70fef923; _ga=GA1.2.342525574.1590598662; _gid=GA1.2.1012451690.1590598662; _strava_cookie_banner=true; _strava4_session=n37ibspiu7888qfpe3q4gq2vfmomgh18; ajs_user_id=58983723; ajs_anonymous_id=%228f2d9e3d-8bb8-413d-8767-044e8e346d8b%22; _dc_gtm_UA-6309847-24=1; _sp_id.047d=a5e215a4-b396-4b41-8833-9843cacd95fa.1590598925.0.1590599041..42cb7a22-18a1-489d-802d-fd07c7cdbad3
referer: https://www.strava.com/segments/9926585
sec-fetch-dest: empty
sec-fetch-mode: cors
sec-fetch-site: same-origin
user-agent: Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/83.0.4103.61 Safari/537.36
x-csrf-token: LLPMDakW75EhJQXp5ZGUjRSOn9Myn6KsevLAeurcNjdIWhFI+EFXXHkqHIv7CDMRxTig0NflM30LsSrcr3bLyg==
x-requested-with: XMLHttpRequest
club_id: 225082
filter: overall
page: 2
per_page: 25
partial: true
Пример моего кода
rom bs4 import BeautifulSoup
import requests
import config as conf
headers = {
'user-Agent': 'Mozilla/5.0 (X11; Ubuntu; Linux x86_64; rv:72.0) Gecko/20100101 Firefox/72.0',
"Accept-Language": "ru-RU,ru;q=0.9,en-US;q=0.8,en;q=0.7"
}
data = {
'authenticity_token': '',
'email': conf.LOGIN,
'password': conf.PASS
}
session = requests.Session()
session.get('https://www.strava.com', headers=headers)
def get_token():
response = session.post(conf.URL_AUTHORIZATION)
soup = BeautifulSoup(response.text, "lxml")
token = soup.find('input',{'name':'authenticity_token'}).get('value')
return token
def authorization():
print(get_token())
data['authenticity_token'] = get_token()
session.post(conf.URL_AUTHORIZATION, headers=headers, data=data)
def get_response():
url = 'https://www.strava.com/segments/9926585/leaderboard?club_id=225082&filter=overall&page={}&per_page={}&partial=true'.format(2, 25)
response = session.post(url, headers=headers)
print(response)
# Точка входа
def main():
authorization()
get_response()
Я так понимаю мне надо взять часть запроса и передать с Post запросом вот только какие?