strezinov
@strezinov
Учусь на програмиста

Спарсить данные из Instgram?

Привет.
Есть скрипт в котором я подключаюсь к inste через cUrl PHP + логин и пароль.
Получаю 302 от сервера, значить все ок.
Дальше я хочу вытащить количество твитов и фоловеров и тут у меня загвоздка я не знаю что дальше делать, как отобразить страницу???

Код: 302 и 200
HTTP/1.1 302 Found
Content-Type: text/html; charset=utf-8
Location: https://www.instagram.com/
Vary: Cookie, Accept-Language
Last-Modified: Thu, 29 Nov 2018 18:02:42 GMT
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Cache-Control: private, no-cache, no-store, must-revalidate
Pragma: no-cache
Content-Language: en
Access-Control-Allow-Origin: https://www.instagram.com
Access-Control-Allow-Credentials: true
Date: Thu, 29 Nov 2018 18:02:42 GMT
Strict-Transport-Security: max-age=60
X-Frame-Options: SAMEORIGIN
content-security-policy: report-uri https://www.instagram.com/security/csp_report/; default-src 'self' https://www.instagram.com; img-src https: data: blob:; font-src https: data:; media-src 'self' blob: https://www.instagram.com https://*.cdninstagram.com https://*.fbcdn.net; manifest-src 'self' https://www.instagram.com; script-src 'self' https://instagram.com https://www.instagram.com https://*.www.instagram.com https://*.cdninstagram.com wss://www.instagram.com https://*.facebook.com https://*.fbcdn.net https://*.facebook.net 'unsafe-inline' 'unsafe-eval' blob:; style-src 'self' https://*.www.instagram.com https://www.instagram.com 'unsafe-inline'; connect-src 'self' https://instagram.com https://www.instagram.com https://*.www.instagram.com https://graph.instagram.com https://*.graph.instagram.com https://*.cdninstagram.com https://api.instagram.com wss://www.instagram.com wss://edge-chat.instagram.com https://*.facebook.com https://*.fbcdn.net https://*.facebook.net chrome-extension://boadgeojelhgndaghljhdicfkmllpafd; worker-src 'self' https://www.instagram.com; frame-src 'self' https://instagram.com https://www.instagram.com https://staticxx.facebook.com https://www.facebook.com https://web.facebook.com https://connect.facebook.net https://m.facebook.com; object-src 'none'; upgrade-insecure-requests
X-Content-Type-Options: nosniff
X-XSS-Protection: 0
Set-Cookie: target=""; Domain=instagram.com; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/
Set-Cookie: target=""; Domain=.instagram.com; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/
Set-Cookie: target=""; Domain=i.instagram.com; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/
Set-Cookie: target=""; Domain=.i.instagram.com; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/
Set-Cookie: target=""; Domain=www.instagram.com; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/
Set-Cookie: target=""; Domain=.www.instagram.com; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/
Set-Cookie: target=""; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/
Set-Cookie: csrftoken=n5Ha3WWujTyeFS2n5YxD9z4kpd6Z3Rui; Domain=.instagram.com; expires=Thu, 28-Nov-2019 18:02:42 GMT; Max-Age=31449600; Path=/; Secure
Set-Cookie: shbid=13753; Domain=.instagram.com; expires=Thu, 06-Dec-2018 18:02:42 GMT; HttpOnly; Max-Age=604800; Path=/; Secure
Set-Cookie: shbts=1543514562.1594174; Domain=.instagram.com; expires=Thu, 06-Dec-2018 18:02:42 GMT; HttpOnly; Max-Age=604800; Path=/; Secure
Set-Cookie: rur=ASH; Domain=.instagram.com; HttpOnly; Path=/; Secure
Set-Cookie: sessionid=5743098867%3AVckbyDvL8DUBYt%3A16; Domain=.instagram.com; expires=Fri, 29-Nov-2019 18:02:42 GMT; HttpOnly; Max-Age=31536000; Path=/; Secure
Connection: keep-alive
Content-Length: 0

HTTP/1.1 200 OK
Content-Type: text/html; charset=utf-8
Cache-Control: private, no-cache, no-store, must-revalidate
Pragma: no-cache
Expires: Sat, 01 Jan 2000 00:00:00 GMT
Vary: Accept-Language, Cookie, Accept-Encoding
Content-Language: en
Access-Control-Allow-Origin: https://www.instagram.com
Access-Control-Allow-Credentials: true
Date: Thu, 29 Nov 2018 18:02:42 GMT
Strict-Transport-Security: max-age=60
X-Frame-Options: SAMEORIGIN
content-security-policy: report-uri https://www.instagram.com/security/csp_report/; default-src 'self' https://www.instagram.com; img-src https: data: blob:; font-src https: data:; media-src 'self' blob: https://www.instagram.com https://*.cdninstagram.com https://*.fbcdn.net; manifest-src 'self' https://www.instagram.com; script-src 'self' https://instagram.com https://www.instagram.com https://*.www.instagram.com https://*.cdninstagram.com wss://www.instagram.com https://*.facebook.com https://*.fbcdn.net https://*.facebook.net 'unsafe-inline' 'unsafe-eval' blob:; style-src 'self' https://*.www.instagram.com https://www.instagram.com 'unsafe-inline'; connect-src 'self' https://instagram.com https://www.instagram.com https://*.www.instagram.com https://graph.instagram.com https://*.graph.instagram.com https://*.cdninstagram.com https://api.instagram.com wss://www.instagram.com wss://edge-chat.instagram.com https://*.facebook.com https://*.fbcdn.net https://*.facebook.net chrome-extension://boadgeojelhgndaghljhdicfkmllpafd; worker-src 'self' https://www.instagram.com; frame-src 'self' https://instagram.com https://www.instagram.com https://staticxx.facebook.com https://www.facebook.com https://web.facebook.com https://connect.facebook.net https://m.facebook.com; object-src 'none'; upgrade-insecure-requests
X-Content-Type-Options: nosniff
X-XSS-Protection: 0
Set-Cookie: sessionid=""; Domain=instagram.com; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/
Set-Cookie: sessionid=""; Domain=.instagram.com; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/
Set-Cookie: sessionid=""; Domain=i.instagram.com; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/
Set-Cookie: sessionid=""; Domain=.i.instagram.com; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/
Set-Cookie: sessionid=""; Domain=www.instagram.com; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/
Set-Cookie: sessionid=""; Domain=.www.instagram.com; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/
Set-Cookie: sessionid=""; expires=Thu, 01-Jan-1970 00:00:00 GMT; Max-Age=0; Path=/
Set-Cookie: rur=ATN; Domain=.instagram.com; HttpOnly; Path=/; Secure
Set-Cookie: csrftoken=zENCcw21N5OWoFSh94GgacXBeBfYcpgI; Domain=.instagram.com; expires=Thu, 28-Nov-2019 18:02:42 GMT; Max-Age=31449600; Path=/; Secure
Connection: keep-alive
Content-Length: 26281


$username = "******";
$password = "******";
$useragent = "Mozilla/5.0 (X11; U; Linux x86_64; en-US; rv:1.9.2.13) Gecko/20101206 Ubuntu/10.10 (maverick) Firefox/3.6.13";
$cookie = $username . ".txt";

$url = "https://instagram.com/accounts/login/?force_classic_login";

$ch = curl_init();
curl_setopt($ch, CURLOPT_URL, $url);
curl_setopt($ch, CURLOPT_COOKIEJAR, dirname(__FILE__) . "/" . $cookie);
curl_setopt($ch, CURLOPT_COOKIEFILE, dirname(__FILE__) . "/" . $cookie);
curl_setopt($ch, CURLOPT_USERAGENT, $useragent);
curl_setopt($ch, CURLOPT_RETURNTRANSFER, 1);
curl_setopt($ch, CURLOPT_HEADER, 1);
curl_setopt($ch, CURLOPT_FOLLOWLOCATION, 1);

$page = curl_exec($ch);

// try to find the actual login form
if (!preg_match('/<form method="POST" id="login-form" class="adjacent".*?<\/form>/is', $page, $form)) {
    throw Instagram_Manager('Failed to find log in form!');
}

$form = $form[0];

// find the action of the login form
if (!preg_match('/action="([^"]+)"/i', $form, $action)) {
    throw Instagram_Manager('Failed to find login form url');
}

$url2 = $action[1]; // this is our new post url
// find all hidden fields which we need to send with our login, this includes security tokens
$count = preg_match_all('/<input type="hidden"\s*name="([^"]*)"\s*value="([^"]*)"/i', $form, $hiddenFields);

$postFields = array();

// turn the hidden fields into an array
for ($i = 0; $i < $count; ++$i) {
    $postFields[$hiddenFields[1][$i]] = $hiddenFields[2][$i];
}

// add our login values
$postFields['username'] = $username;
$postFields['password'] = $password;

$post = '';

// convert to string, this won't work as an array, form will not accept multipart/form-data, only application/x-www-form-urlencoded
foreach ($postFields as $key => $value) {
    $post .= $key . '=' . urlencode($value) . '&';
}

$post = substr($post, 0, -1);

// set additional curl options using our previous options
curl_setopt($ch, CURLOPT_URL, "https://instagram.com/" . $url2);
curl_setopt($ch, CURLOPT_REFERER, $url);
curl_setopt($ch, CURLOPT_POST, 1);
curl_setopt($ch, CURLOPT_POSTFIELDS, $post);
$page = curl_exec($ch);

// connect to profile edit page
$url = "https://instagram.com/accounts/edit/";
curl_setopt($ch, CURLOPT_URL, $url);
echo curl_exec($ch);
  • Вопрос задан
  • 1064 просмотра
Пригласить эксперта
Ваш ответ на вопрос

Войдите, чтобы написать ответ

Похожие вопросы