@bybacapitan

Как правильно спарсить?

Возникают проблемы с парсингом сайта.
Вот кусочек html который я хочу спарсить.
<a class="cbp-caption" download="" href="https://instagram.fmci2-1.fna.fbcdn.net/v/t51.2885-15/292446813_521081633039407_4511580007793472992_n.jpg?stp=dst-jpg_e35_p1080x1080&amp;_nc_ht=instagram.fmci2-1.fna.fbcdn.net&amp;_nc_cat=107&amp;_nc_ohc=JWQPh-Xz3VkAX_KdWTG&amp;tn=K13lkL_2Iyqe1ziK&amp;edm=APU89FABAAAA&amp;ccb=7-5&amp;oh=00_AT-DX5FPSYEe6L7va4gbqmo3dLEyusw70_XagQ9TJo_Q6Q&amp;oe=62D3F913&amp;_nc_sid=86f79a&amp;dl=1">
<div class="cbp-caption-defaultWrap col-lg-12">
<div class="downloader-post-type"><i class="far fa-image"></i></div>
<img alt="" src="https://cdn.bigbangram.com/media/292446813_521081633039407_4511580007793472992_n.jpg?url=https%3A%2F%2Fscontent.cdninstagram.com%2Fv%2Ft51.2885-15%2F292446813_521081633039407_4511580007793472992_n.jpg%3Fstp%3Ddst-jpg_e35_p1080x1080%26_nc_ht%3Dinstagram.fmci2-1.fna.fbcdn.net%26_nc_cat%3D107%26_nc_ohc%3DJWQPh-Xz3VkAX_KdWTG%26tn%3DK13lkL_2Iyqe1ziK%26edm%3DAPU89FABAAAA%26ccb%3D7-5%26oh%3D00_AT-DX5FPSYEe6L7va4gbqmo3dLEyusw70_XagQ9TJo_Q6Q%26oe%3D62D3F913%26_nc_sid%3D86f79a&amp;time=1657616400&amp;key=4eb222fac7e100f68ec045427e82ec98"/>
<div class="cbp-caption-activeWrap bg-primary">
<div class="cbp-l-caption-alignCenter">
<div class="cbp-l-caption-body">
<h4 class="h6 text-white mb-0">DOWNLOAD</h4>
</div>
</div>
</div>
</div>
</a>, <a class="cbp-caption" download="" href="https://instagram.fmci2-1.fna.fbcdn.net/v/t51.2885-15/292588375_429237229118159_566628364254802004_n.jpg?stp=dst-jpg_e35_p1080x1080&amp;_nc_ht=instagram.fmci2-1.fna.fbcdn.net&amp;_nc_cat=111&amp;_nc_ohc=Vh_rWdy2iwcAX85L8va&amp;edm=APU89FABAAAA&amp;ccb=7-5&amp;oh=00_AT-3Yp7gua54JmUE8wdKPYo4cqnVVC5sLdu9iDdTAc5J-g&amp;oe=62D33660&amp;_nc_sid=86f79a&amp;dl=1">
<div class="cbp-caption-defaultWrap col-lg-12">
<div class="downloader-post-type"><i class="far fa-image"></i></div>
<img alt="" src="https://cdn.bigbangram.com/media/292588375_429237229118159_566628364254802004_n.jpg?url=https%3A%2F%2Fscontent.cdninstagram.com%2Fv%2Ft51.2885-15%2F292588375_429237229118159_566628364254802004_n.jpg%3Fstp%3Ddst-jpg_e35_p1080x1080%26_nc_ht%3Dinstagram.fmci2-1.fna.fbcdn.net%26_nc_cat%3D111%26_nc_ohc%3DVh_rWdy2iwcAX85L8va%26edm%3DAPU89FABAAAA%26ccb%3D7-5%26oh%3D00_AT-3Yp7gua54JmUE8wdKPYo4cqnVVC5sLdu9iDdTAc5J-g%26oe%3D62D33660%26_nc_sid%3D86f79a&amp;time=1657616400&amp;key=0933b0ac41b0a3a1cf58b83c6a7c8c5c"/>
<div class="cbp-caption-activeWrap bg-primary">
<div class="cbp-l-caption-alignCenter">
<div class="cbp-l-caption-body">
<h4 class="h6 text-white mb-0">DOWNLOAD</h4>
</div>
</div>
</div>
</div>
</a>, <a class="cbp-caption" download="" href="https://instagram.fmci2-1.fna.fbcdn.net/v/t51.2885-15/292788263_117592214200698_4990970171600333707_n.jpg?stp=dst-jpg_e35_p1080x1080&amp;_nc_ht=instagram.fmci2-1.fna.fbcdn.net&amp;_nc_cat=106&amp;_nc_ohc=Jea0H5KMGvUAX9XUZCQ&amp;edm=APU89FABAAAA&amp;ccb=7-5&amp;oh=00_AT_759DfrkcHdUyw7MG4NHDn15C-M6ynROeSXP86auIoNQ&amp;oe=62D50F3F&amp;_nc_sid=86f79a&amp;dl=1">
<div class="cbp-caption-defaultWrap col-lg-12">
<div class="downloader-post-type"><i class="far fa-image"></i></div>
<img alt="" src="https://cdn.bigbangram.com/media/292788263_117592214200698_4990970171600333707_n.jpg?url=https%3A%2F%2Fscontent.cdninstagram.com%2Fv%2Ft51.2885-15%2F292788263_117592214200698_4990970171600333707_n.jpg%3Fstp%3Ddst-jpg_e35_p1080x1080%26_nc_ht%3Dinstagram.fmci2-1.fna.fbcdn.net%26_nc_cat%3D106%26_nc_ohc%3DJea0H5KMGvUAX9XUZCQ%26edm%3DAPU89FABAAAA%26ccb%3D7-5%26oh%3D00_AT_759DfrkcHdUyw7MG4NHDn15C-M6ynROeSXP86auIoNQ%26oe%3D62D50F3F%26_nc_sid%3D86f79a&amp;time=1657616400&amp;key=6d177fec4b73ae37c74026e440e535f9"/>
<div class="cbp-caption-activeWrap bg-primary">
<div class="cbp-l-caption-alignCenter">
<div class="cbp-l-caption-body">
<h4 class="h6 text-white mb-0">DOWNLOAD</h4>
</div>
</div>
</div>
</div>


Пробую делать так
soup = BS(src, 'html.parser')
links = soup.find_all('a', {'class': 'cbp-caption'})
    for link in links:
        a_href = link.find("a",{"class":"cbp-caption"}).get("href")
        print(a_href)


Получаю ошибку
AttributeError: 'NoneType' object has no attribute 'get'

Что не так?
  • Вопрос задан
  • 100 просмотров
Пригласить эксперта
Ваш ответ на вопрос

Войдите, чтобы написать ответ

Похожие вопросы