Есть код. Рабочий 100%, собирает правильне ссылки с 2-3 страниц. Дальше начинает выдавать ошибку:
requests.exceptions.ConnectionError: HTTPSConnectionPool(host='krs-pobierz.pl', port=443): Max retries exceeded with url: /b-m-moscinscy-spolka-jawna-i6490187 (Caused by NewConnectionError(': Failed to establish a new connection: [Errno 113] No route to host'))
Обрабоки try except не решают проблемы, как и max_retry
Вот код:
headers = {
"Accept": "*/*",
"User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64; rv:101.0) Gecko/20100101 Firefox/101.0"}
first_links = []
for i in range(1, 10):
url = 'https://krs-pobierz.pl/szukaj?q=4110Z&page={}'.format(i)
while True:
try:
r = requests.get(url, headers)
except requests.exceptions.ConnectionError:
time.sleep(2)
continue
soup = BeautifulSoup(r.text, 'lxml')
boxes = soup.find_all('div', class_='col-9')
for l in boxes:
links = l.find('a').get('href')
while True:
try:
q = requests.get(links)
except requests.exceptions.ConnectionError:
time.sleep(2)
continue
result = q.content
soup = BeautifulSoup(result, 'lxml')
try:
kved_text = soup.find_all('td', class_='col-xs-8')[13].text
except IndexError as ex:
kved_text = '-'
if(kved_text == 'Realizacja projektów budowlanych związanych ze wznoszeniem budynków (4110Z)'):
first_links.append(links)
else:
continue
with open('first_links.txt', 'a') as f:
for line in first_links:
f.write(f'{line}\n')