Python爬取网页返回521状态码的解决方案

 更新时间:2026年03月26日 09:39:12   作者:NorburyL  
该文章分析了Python爬虫在爬取网页详情页时遇到数据长度有限的问题,原因在于频繁爬取触发了反爬机制,解决方案包括更换设备、复制Headers、适当增加访问间隔等,文中还提供了几种代码示例,帮助用户解决反爬问题,需要的朋友可以参考下

问题描述:

在使用Python爬虫爬取网页的列表页中的详情页时,返回的详情页的html文件的数据长度有限。

原因分析:

频繁爬取目标网站,导致的网址反爬虫措施

解决方案:

如果解决不了,你可以把要爬取网页的源码先保存下来,进行后续的处理。

方法一:

换一个vpn,也就是换一台电脑执行程序

方法二:

复制目标网页的Headers添加到代码中

根据目标情况不同修改

def askURL(url):
    head = {  # 模拟浏览器头部信息,向豆瓣服务器发送消息
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
        'Cache-Control': 'max-age=0',
        'Connection': 'keep-alive',
        'Cookie': 'mfw_uuid=61dc38ef-2c67-45ce-ed26-c30fa04f2418; oad_n=a:3:{s:3:"oid";i:1029;s:2:"dm";s:15:"www.mafengwo.cn";s:2:"ft";s:19:"2022-01-10+21:47:27";}; __jsluid_h=aa6e6e4350e2fd0e52cc227da10e26b5; __omc_chl=; __omc_r=; __mfwc=direct; uva=s:78:"a:3:{s:2:"lt";i:1641822448;s:10:"last_refer";s:6:"direct";s:5:"rhost";s:0:"";}";; __mfwurd=a:3:{s:6:"f_time";i:1641822448;s:9:"f_rdomain";s:0:"";s:6:"f_host";s:3:"www";}; __mfwuuid=61dc38ef-2c67-45ce-ed26-c30fa04f2418; UM_distinctid=17e443e711c512-05dd7ff73ec639-5e181552-144000-17e443e711dc58; login=mafengwo; mafengwo=16a582a6e0ca5f6c73654cb640343886_35627906_61e15d7be119c7.29366428_61e15d7be11a11.54996187; PHPSESSID=lf7mtvr2mgj3fhnfd7sn9br1c2; mfw_uid=35627906; Hm_lvt_8288b2ed37e5bc9b4c9f7008798d2de0=1642218839,1642238624,1642341547,1642381972; CNZZDATA30065558=cnzz_eid=1067569765-1641819345-&ntime=1642380961; __jsl_clearance=1642381970.541|0|cYxjLrAJMIg1j5y/qJP9hLaEN7M=; __mfwa=1641822449293.40635.15.1642341546971.1642381972692; __mfwlv=1642381972; __mfwvn=11; bottom_ad_status=0; __mfwb=b5923a0d408d.8.direct; __mfwlt=1642382984; Hm_lpvt_8288b2ed37e5bc9b4c9f7008798d2de0=1642382985',
        'Host': 'www.mafengwo.cn',
        'Upgrade-Insecure-Requests': '1',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36 Edg/97.0.1072.55',
    }
    # 用户代理,表示告诉豆瓣服务器,我们是什么类型的机器、浏览器(本质上是告诉浏览器,我们可以接收什么水平的文件内容)

    request = urllib.request.Request(url, headers=head)
    html = ""
    try:
        response = urllib.request.urlopen(request)
        html = response.read().decode("utf-8")
    except urllib.error.URLError as e:
        if hasattr(e, "code"):
            print(e.code)
        if hasattr(e, "reason"):
            print(e.reason)
    return html
}

方法三:

两次访问目标详情页

代码一

import execjs
import requests
import re

head = {  # 模拟浏览器头部信息,向豆瓣服务器发送消息
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
    'Cache-Control': 'max-age=0',
    'Connection': 'keep-alive',
    'Host': 'www.mafengwo.cn',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36 Edg/97.0.1072.55',
}
url = 'http://www.mafengwo.cn/poi/5423409.html'

# response = requests.get(url)
# # cookie1
# cookie1 = response.cookies
# # js代码
# js_code = response.text




def get_521_content(url,head):

    req = requests.get(url, headers=head)
    cookies = req.cookies

    cookies = '; '.join(['='.join(item) for item in cookies.items()])
    txt_521 = req.text
    txt_521 = ''.join(re.findall('<script>(.*?)</script>', txt_521))
    return (txt_521, cookies)


def fixed_fun(function):
    func_return = function.replace('eval', 'return')
    content = execjs.compile(func_return)

    req = requests.get(url, headers=head)
    evaled_func = ''.join(re.findall('<script>(.*?)</script>', req.text))
    # print(js_con)
    # fn = js_con.split('=').split(' ')
    # evaled_func = content.call(fn)

    # print(evaled_func)
    mode_func = evaled_func.replace('while(window._phantom||window.__phantomas){};', ''). \
        replace('document.cookie=', 'return').replace(';if((function(){try{return !!window.addEventListener;}', ''). \
        replace("catch(e){return false;}})()){document.addEventListener('DOMContentLoaded',l,false);}", ''). \
        replace("else{document.attachEvent('onreadystatechange',l);}", '').replace(
        r"setTimeout('location.href=location.href.replace(/[\?|&]captcha-challenge/,\'\')',1500);", '')
    content = execjs.compile(mode_func)
    cookies = content.call('l')
    __jsl_clearance = cookies.split(';')[0]
    return __jsl_clearance


def cookie_dict(js, id):
    dict = {}
    js = js.split('=')
    id = id.split('=')
    dict[js[0]] = js[1]
    dict[id[0]] = id[1]
    return dict


if __name__ == '__main__':
    func = get_521_content(url,head)
    content = func[0]

    cookie_id = func[1]
    cookie_js = fixed_fun(func[0])
    dicted_cookie = cookie_dict(cookie_js, cookie_id)

    head = {  # 模拟浏览器头部信息,向豆瓣服务器发送消息
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
        'Cache-Control': 'max-age=0',
        'Connection': 'keep-alive',
        'Host': 'www.mafengwo.cn',
        'Upgrade-Insecure-Requests': '1',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36 Edg/97.0.1072.55',
        'Cookie': cookie_id + ';' + cookie_js
    }
    req = requests.get(url, headers=head)
    print(req.status_code)

代码二

# resouce:https://blog.csdn.net/qq_41879417/article/details/101701120?spm=1001.2101.3001.6661.1&utm_medium=distribute.pc_relevant_t0.none-task-blog-2%7Edefault%7ECTRLIST%7Edefault-1.pc_relevant_default&depth_1-utm_source=distribute.pc_relevant_t0.none-task-blog-2%7Edefault%7ECTRLIST%7Edefault-1.pc_relevant_default&utm_relevant_index=1
# -*- coding: utf-8 -*-
# @Time : 2022/1/16 9:11
# @Author : sherlock
# @File : creeper_2_521.py
# @Project : creeper

import execjs

import re

import requests

url = 'http://www.mafengwo.cn/poi/5423409.html'

head = {  # 模拟浏览器头部信息,向豆瓣服务器发送消息
    'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
    'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
    'Cache-Control': 'max-age=0',
    'Connection': 'keep-alive',
    'Host': 'www.mafengwo.cn',
    'Upgrade-Insecure-Requests': '1',
    'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36 Edg/97.0.1072.55',
}




def get_521_content(url):
    req = requests.get(url, headers=head, timeout=5)
    print(req.status_code, req.text)
    if req.status_code == 521:
        cookies = dict(req.cookies.items())
        print(cookies)
        js_con = ''.join(re.findall('<script>(.*?)</script>', req.text))
        if js_con:
            __jsl_clearance = fixed_fun(js_con, url)
            if __jsl_clearance:
                key, value = __jsl_clearance.split('=')
                cookies[key] = value
                return cookies


# 执行js代码获取cookies 的__jsl_clearance的键值
def fixed_fun(js_con, url):  # js_con 第一次请求获取的js内容

    func_return = js_con.replace('eval(', 'return(')
    print('第一次替换eval==》return后:  ', func_return)
    content = execjs.compile(func_return)
    # fn = js_con.split('=')[0].split(' ')[1]
    # 只有['document.cookie']
    fn = js_con.split('=')[0].split(' ')[1]
    evaled_func = content.call(fn)
    print('第一次执行js代码后: ', evaled_func)
    fn = evaled_func.split('=')[0].split(' ')[1]  # 获取动态函数名
    aa = evaled_func.split("<a href=\\'/\\'>")  # 获取<a>标签的内容
    aa = aa[1].split("</a>")[0] if len(aa) >= 2 else ''
    mode_func = evaled_func. \
        replace(
        "setTimeout('location.href=location.pathname+location.search.replace(/[\\?|&]captcha-challenge/,\\'\\')',1500);document.cookie=",
        'return'). \
        replace(';if((function(){try{return !!window.addEventListener;}', ''). \
        replace(
        "}catch(e){return false;}})()){document.addEventListener('DOMContentLoaded'," + fn + ",false)}else{document.attachEvent('onreadystatechange'," + fn + ")",
        ''). \
        replace(
        "if((function(){try{return !!window.addEventListener;}catch(e){return false;}})()){document.addEventListener('DOMContentLoaded'," + fn + ",false)}else{document.attachEvent('onreadystatechange'," + fn + ")",
        ''). \
        replace("return'__jsl_clearance", "var window={};return '__jsl_clearance"). \
        replace(
        "var " + fn + "=document.createElement('div');" + fn + ".innerHTML='<a href=\\'/\\'>" + aa + "</a>';" + fn + "=" + fn + ".firstChild.href",
        "var " + fn + "='" + url + "'")
    print('第二次替换后的js代码:', mode_func)
    try:
        content = execjs.compile(mode_func)
        cookies = content.call(fn)
        __jsl_clearance = cookies.split(';')[0]
        print(__jsl_clearance)
        return __jsl_clearance
    except:
        print('js执行错误:', mode_func)
        return None


# 携带解密后的cookies第二次爬取详情页
def con_spider(cookies, url):
    response = requests.get(url, headers=head, cookies=cookies, timeout=5)
    if response.status_code == 200:
        response.encoding = 'utf-8'
        print(response.status_code)
        print(response.text)
        return response
    else:
        print('第二次爬取错误状态码:', response.status_code)
        return None


if __name__ == "__main__":
    cookies = get_521_content(url)
    con_spider(cookies, url)

代码三

# resource:https://www.cnblogs.com/gongs/p/10524710.html

import execjs

import re

import requests

url = 'http://www.mafengwo.cn/poi/5423409.html'

head = {  # 模拟浏览器头部信息,向豆瓣服务器发送消息
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6",
    "Cache-Control": "max-age=0",
    "Connection": "keep-alive",

    "Host": "www.mafengwo.cn",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36 Edg/97.0.1072.55",
}





def getResponse():
    """
    获取response
    :return:
    """
    response = requests.get(url, headers=head)
    return response


def getJslid(response):
    """

    :param response:
    :return:
    """

    cook = response.cookies

    ans = '; '.join(['='.join(item) for item in cook.items()])

    return ans


def getClearance(response):
    """

    :return:
    """
    txt = ''.join(re.findall('<script>(.*?)</script>', response.text))
    func_return = txt.replace('eval', 'return')

    content = execjs.compile(func_return)

    print("accurate error")
    # error
    eval_func = content.call('x')
    print(1)
    name = re.findall(r'var (.*?)=function.*', eval_func)[0]
    print(2)
    mode_func = eval_func.replace('while(window._phantom||window.__phantomas){};', ''). \
        replace('document.cookie=', 'return').replace('if((function(){try{return !!window.addEventListener;}', ''). \
        replace("catch(e){return false;}})()){document.addEventListener('DOMContentLoaded',%s,false)}" % name, ''). \
        replace("else{document.attachEvent('onreadystatechange',%s)}" % name, '').replace(
        r"setTimeout('location.href=location.pathname+location.search.replace(/[\?|&]captcha-challenge/,\'\')',1500);",
        '')  

    content = execjs.compile(mode_func)
    cookies = content.call(name)
    # print(cookies)
    clearance = cookies.split(';')[0]

    return clearance


def structurehead(cook, clearance):
    """
    构造新的head
    :return:
    """

    cookie = {
        'cookie': cook + ';' + clearance
    }

    return dict(head, **cookie)

def main():
    response = getResponse()
    cook = getJslid(response)
    print("error")
    # this step has some error about exejcss
    clearance = getClearance(response)
    print("2 error")
    dict = structurehead(cook, clearance)
    print(dict)

if __name__ == '__main__':
    main()

代码四

# -*- coding: utf-8 -*-
# @Time : 2022/1/18 13:32
# @Author : sherlock
# @File : creeper_4_521.py
# @Project : creeper

# coding=utf-8
# author=zhangjingyuan
# python3
from html.parser import HTMLParser
import lxml
import requests
from lxml import etree
import urllib.request
import urllib.parse
import re
import time
import io
import gzip
import random
import codecs
import execjs
import requests
import re

url1 = 'http://www.mafengwo.cn/poi/5423409.html'

url2 = 'https://jobs.51job.com/haikou/135562401.html'

url3 = 'https://movie.douban.com/subject/1292052/'

head = {  # 模拟浏览器头部信息,向豆瓣服务器发送消息
    # "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    # "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6",
    # "Cache-Control": "max-age=0",
    # "Connection": "keep-alive",
    # "Cookie": 'mfw_uuid=61dc38ef-2c67-45ce-ed26-c30fa04f2418; oad_n=a:3:{s:3:"oid";i:1029;s:2:"dm";s:15:"www.mafengwo.cn";s:2:"ft";s:19:"2022-01-10+21:47:27";}; __jsluid_h=aa6e6e4350e2fd0e52cc227da10e26b5; __omc_chl=; __omc_r=; __mfwc=direct; uva=s:78:"a:3:{s:2:"lt";i:1641822448;s:10:"last_refer";s:6:"direct";s:5:"rhost";s:0:"";}";; __mfwurd=a:3:{s:6:"f_time";i:1641822448;s:9:"f_rdomain";s:0:"";s:6:"f_host";s:3:"www";}; __mfwuuid=61dc38ef-2c67-45ce-ed26-c30fa04f2418; UM_distinctid=17e443e711c512-05dd7ff73ec639-5e181552-144000-17e443e711dc58; login=mafengwo; mafengwo=16a582a6e0ca5f6c73654cb640343886_35627906_61e15d7be119c7.29366428_61e15d7be11a11.54996187; __jsl_clearance=1642341544.979|0|fafiHNHGZB+baEyxg5NVjPfVXm0=; PHPSESSID=s4foj9fhkm3mq8rs64omagvvp2; mfw_uid=35627906; __mfwa=1641822449293.40635.14.1642238623523.1642341546971; __mfwlv=1642341546; __mfwvn=10; Hm_lvt_8288b2ed37e5bc9b4c9f7008798d2de0=1642215122,1642218839,1642238624,1642341547; CNZZDATA30065558=cnzz_eid=1067569765-1641819345-&ntime=1642337760; bottom_ad_status=0; uol_throttle=35627906; __mfwb=8cc49c72508e.10.direct; __mfwlt=1642343676; Hm_lpvt_8288b2ed37e5bc9b4c9f7008798d2de0=1642343676',
    # "Host": "www.mafengwo.cn",
    # "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36 Edg/97.0.1072.55",
}


def getResponse():
    """
    获取response
    :return:
    """
    response = requests.get(url1, headers=head)
    return response


def getJslid(response):
    """

    :param response:
    :return:
    """
    cook = response.cookies
    return '; '.join(['='.join(item) for item in cook.items()])


def getClearance(response):
    """

    :return:
    """
    txt = ''.join(re.findall('<script>(.*?)</script>', response.text))
    func_return = txt.replace('eval', 'return')
    print(func_return)

    content = execjs.compile(func_return)
    print(type(content))
    # content = open("jsdom_document").read()
    # print(content)
    # execjs._exceptions.ProgramError: ReferenceError: document is not defined
    eval_func = content.call('x')

    name = re.findall(r'var (.*?)=function.*', eval_func)[0]

    mode_func = eval_func.replace('while(window._phantom||window.__phantomas){};', ''). \
        replace('document.cookie=', 'return').replace('if((function(){try{return !!window.addEventListener;}', ''). \
        replace("catch(e){return false;}})()){document.addEventListener('DOMContentLoaded',%s,false)}" % name, ''). \
        replace("else{document.attachEvent('onreadystatechange',%s)}" % name, '').replace(
        r"setTimeout('location.href=location.pathname+location.search.replace(/[\?|&]captcha-challenge/,\'\')',1500);",
        '')

    content = execjs.compile(mode_func)
    cookies = content.call(name)
    # print(cookies)
    clearance = cookies.split(';')[0]

    return clearance


def structureCookie(cook, clearance):
    """
    构造新的headers
    :return:
    """

    cookie = cook + ';' + clearance
    print(cookie)

    return cookie


if __name__ == '__main__':
    response = getResponse()
    clearance = getClearance(response)
    cook = getJslid(response)

    head = {  # 模拟浏览器头部信息,向豆瓣服务器发送消息
        'Accept': 'text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9',
        'Accept-Language': 'zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6',
        'Cache-Control': 'max-age=0',
        'Connection': 'keep-alive',
        'Host': 'www.mafengwo.cn',
        'Cookie': cook,
        'Upgrade-Insecure-Requests': '1',
        'User-Agent': 'Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36 Edg/97.0.1072.55',
    }


    request = urllib.request.Request(url2, headers=head)

    html = ""
    try:
        response = urllib.request.urlopen(request)
        html = response.read().decode(encoding="utf-8", errors="ignore")
        print(html)
    except urllib.error.URLError as e:
        if hasattr(e, "code"):
            print("状态码:%s" % (e.code))
        if hasattr(e, "reason"):
            print("原因:%s" % (e.reason))

代码五

# -*- coding: utf-8 -*-
# @Time : 2022/1/18 17:43
# @Author : sherlock
# @File : creeper_5_seleu.py
# @Project : creeper


# -*-  coding:utf-8 -*-

import requests
from bs4 import BeautifulSoup
import redis
from selenium import webdriver
from selenium.webdriver.chrome.options import Options
import smtplib
import email.utils
from email.mime.text import MIMEText
import time

url = 'https://www.ipip.net'


def driver_chrome():
    chrome_options = Options()
    chrome_options.add_argument("--headless")
    chrome_options.add_argument('--no-sandbox')
    chrome_options.add_argument("--disable-gpu")
    chrome_options.add_argument("--window-size=1920x1080")
    driver = webdriver.Chrome(chrome_options=chrome_options)
    return driver


def mymail(content):
    msg = MIMEText(content, _subtype='plain', _charset='utf8')
    msg['From'] = email.utils.formataddr(('Author', '989989797@qq.com'))
    msg['To'] = email.utils.formataddr(('Recipient', '8979879879@me.com'))
    msg['date'] = time.strftime('%a, %d %b %Y %H:%M:%S %z')
    msg['Subject'] = 'Your ip address'
    return msg


r = redis.Redis(host='localhost', port=6379, decode_responses=True)
myip = r.get('myip')
driver = driver_chrome()
driver.get(url)
cookies = driver.get_cookies()
new_cookies = {}
for i in cookies:
    driver.add_cookie({'name': i.get('name'), 'value': i.get('value')})
driver.get(url)
soup = BeautifulSoup(driver.page_source, features='lxml')
myres = soup.find_all('div', attrs={'class': 'yourInfo'})
trueip = myres[0].find_all('a')[0].text
msg = mymail(trueip)

with smtplib.SMTP_SSL('smtp.qq.com', 465) as server:
    server.login('80988988@qq.com', '9jsdfhjhfio')
    if myip != trueip:
        r.set('myip', trueip)
        server.sendmail('98198397@qq.com', '9879878798@me.com', msg.as_string())

driver.close()
driver.quit()

Test代码

# coding=utf-8
# author=zhangjingyuan
# python3
from html.parser import HTMLParser
import lxml
import requests
from lxml import etree
import urllib.request
import urllib.parse
import re
import time
import io
import gzip
import random
import codecs

url1 = 'http://www.mafengwo.cn/poi/5423409.html'

url2 = 'https://jobs.51job.com/haikou/135562401.html'

url3 ='https://movie.douban.com/subject/1292052/'

url4 = 'http://www.mafengwo.cn/search/q.php?q=%E6%B3%89%E5%B7%9E'

head = {  # 模拟浏览器头部信息,向豆瓣服务器发送消息
    "Accept": "text/html,application/xhtml+xml,application/xml;q=0.9,image/webp,image/apng,*/*;q=0.8,application/signed-exchange;v=b3;q=0.9",
    "Accept-Language": "zh-CN,zh;q=0.9,en;q=0.8,en-GB;q=0.7,en-US;q=0.6",
    "Cache-Control": "max-age=0",
    "Connection": "keep-alive",
    "Cookie": 'mfw_uuid=61dc38ef-2c67-45ce-ed26-c30fa04f2418; oad_n=a:3:{s:3:"oid";i:1029;s:2:"dm";s:15:"www.mafengwo.cn";s:2:"ft";s:19:"2022-01-10+21:47:27";}; __jsluid_h=aa6e6e4350e2fd0e52cc227da10e26b5; __omc_chl=; __omc_r=; __mfwc=direct; uva=s:78:"a:3:{s:2:"lt";i:1641822448;s:10:"last_refer";s:6:"direct";s:5:"rhost";s:0:"";}";; __mfwurd=a:3:{s:6:"f_time";i:1641822448;s:9:"f_rdomain";s:0:"";s:6:"f_host";s:3:"www";}; __mfwuuid=61dc38ef-2c67-45ce-ed26-c30fa04f2418; UM_distinctid=17e443e711c512-05dd7ff73ec639-5e181552-144000-17e443e711dc58; login=mafengwo; mafengwo=16a582a6e0ca5f6c73654cb640343886_35627906_61e15d7be119c7.29366428_61e15d7be11a11.54996187; __jsl_clearance=1642341544.979|0|fafiHNHGZB+baEyxg5NVjPfVXm0=; PHPSESSID=s4foj9fhkm3mq8rs64omagvvp2; mfw_uid=35627906; __mfwa=1641822449293.40635.14.1642238623523.1642341546971; __mfwlv=1642341546; __mfwvn=10; Hm_lvt_8288b2ed37e5bc9b4c9f7008798d2de0=1642215122,1642218839,1642238624,1642341547; CNZZDATA30065558=cnzz_eid=1067569765-1641819345-&ntime=1642337760; bottom_ad_status=0; uol_throttle=35627906; __mfwb=8cc49c72508e.10.direct; __mfwlt=1642343676; Hm_lpvt_8288b2ed37e5bc9b4c9f7008798d2de0=1642343676',
    "Host": "www.mafengwo.cn",
    "Upgrade-Insecure-Requests": "1",
    "User-Agent": "Mozilla/5.0 (Windows NT 10.0; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/97.0.4692.71 Safari/537.36 Edg/97.0.1072.55",
}

# # 输出访问网页的状态码
# req = requests.get(url, headers=head).status_code
# print(req)

request = urllib.request.Request(url1, headers=head)

html = ""
try:
    response = urllib.request.urlopen(request)
    html = response.read().decode(encoding="utf-8", errors="ignore")
    print(html)
except urllib.error.URLError as e:
    if hasattr(e, "code"):
        print("状态码:%s"%(e.code))
    if hasattr(e, "reason"):
        print("原因:%s"%(e.reason))

# response = requests.get(url1)
# print(response)

# # cookie1
# cookie1 = response.cookies
# print(cookie1)
# # js代码
# js_code = response.text
# print(js_code)

以上就是Python爬取网页返回521状态码的解决方案的详细内容,更多关于Python爬取网页返回521状态码的资料请关注脚本之家其它相关文章!

相关文章

  • 读取json格式为DataFrame(可转为.csv)的实例讲解

    读取json格式为DataFrame(可转为.csv)的实例讲解

    今天小编就为大家分享一篇读取json格式为DataFrame(可转为.csv)的实例讲解,具有很好的参考价值,希望对大家有所帮助。一起跟随小编过来看看吧
    2018-06-06
  • Python实现Event回调机制的方法

    Python实现Event回调机制的方法

    今天小编就为大家分享一篇Python实现Event回调机制的方法,具有很好的参考价值,希望对大家有所帮助。一起跟随小编过来看看吧
    2019-02-02
  • python如何与前端交互举例详解

    python如何与前端交互举例详解

    在Web开发中前端与后端之间的数据交互,下面这篇文章主要给大家介绍了关于python如何与前端交互的相关资料,文章通过代码介绍的非常详细,需要的朋友可以参考下
    2024-08-08
  • Python中gevent模块协程使用

    Python中gevent模块协程使用

    协程是一种用户态的轻量级线程,本文主要介绍了Python中gevent模块协程使用,文中通过示例代码介绍的非常详细,需要的朋友们下面随着小编来一起学习学习吧
    2021-07-07
  • Python实现批量下载文件

    Python实现批量下载文件

    之前给大家分享的python 多线程抓取网页,不过这个只能用python 来抓取到网页的源代码,如果你想用做python 下载文件的话,上面的可能就不适合你了,最近我在用python 做文件下载的时候就遇到这个问题了,不过最终得以解决,我把代码发出来
    2015-05-05
  • python使用FastAPI获取请求头信息的两种方法

    python使用FastAPI获取请求头信息的两种方法

    本文聚焦 FastAPI 获取请求头信息的两种方法,首先阐述使用Request对象,通过在路由处理函数中注入Request,接着介绍参数依赖注入法,在函数参数里用Header声明请求头参数,文中有相关的代码示例供大家参考,需要的朋友可以参考下
    2025-02-02
  • 不知道这5种下划线的含义,你就不算真的会Python!

    不知道这5种下划线的含义,你就不算真的会Python!

    Python是一种高级程序语言,其核心设计哲学是代码可读性和语法,能够让程序员用很少的代码来表达自己的想法。这篇文章主要介绍了不知道这5种下划线的含义,你就不算真的会Python!对此标题感兴趣的朋友一起阅读本文吧
    2018-10-10
  • Python版本与package版本兼容性检查方法总结

    Python版本与package版本兼容性检查方法总结

    这篇文章主要介绍了Python版本与package版本兼容性检查方法的相关资料,文中提供四种检查方法,分别是pip查询、conda管理、PyPI官网确认、脚本自动检测,通过代码介绍的非常详细,需要的朋友可以参考下
    2025-09-09
  • python 检查是否为中文字符串的方法

    python 检查是否为中文字符串的方法

    今天小编就为大家分享一篇python 检查是否为中文字符串的方法,具有很好的参考价值,希望对大家有所帮助。一起跟随小编过来看看吧
    2018-12-12
  • 如何利用python批量提取txt文本中所需文本并写入excel

    如何利用python批量提取txt文本中所需文本并写入excel

    最近帮人写了几个小程序,所以记录下,下面这篇文章主要给大家介绍了关于如何利用python批量提取txt文本中所需文本并写入excel的相关资料,文中通过实例代码介绍的非常详细,需要的朋友可以参考下
    2022-07-07

最新评论