python实现百度关键词排名查询

更新时间：2014年03月30日 10:29:08 作者：

这篇文章主要介绍了python实现百度关键词排名查询,需要的朋友可以参考下

就是一个简单的python查询百度关键词排名的函数，以下是一些简介：
1、UA随机
2、操作简单方便，直接getRank(关键词，域名)就可以了
3、编码转化。编码方面应该没啥问题了。
4、结果丰富。不仅有排名，还有搜索结果的title，URL，快照时间，符合SEO需求
5、拿来做个软件或者自己用都很方便。

功能是单线程实现，速度慢，大家可以参考修改成自己需要的。

复制代码代码如下:

#coding=utf-8

import requests
import BeautifulSoup
import re
import random

def decodeAnyWord(w):
    try:
        w.decode('utf-8')
    except:
        w = w.decode('gb2312')
    else:
        w = w.decode('utf-8')
    return w

def createURL(checkWord):   #create baidu URL with search words
    checkWord = checkWord.strip()
    checkWord = checkWord.replace(' ', '+').replace('\n', '')
    baiduURL = 'http://www.baidu.com/s?wd=%s&rn=100' % checkWord
    return baiduURL

def getContent(baiduURL):   #get the content of the serp
    uaList = ['Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+1.1.4322;+TencentTraveler)',
    'Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1;+.NET+CLR+2.0.50727;+.NET+CLR+3.0.4506.2152;+.NET+CLR+3.5.30729)',
    'Mozilla/5.0+(Windows+NT+5.1)+AppleWebKit/537.1+(KHTML,+like+Gecko)+Chrome/21.0.1180.89+Safari/537.1',
    'Mozilla/4.0+(compatible;+MSIE+6.0;+Windows+NT+5.1;+SV1)',
    'Mozilla/5.0+(Windows+NT+6.1;+rv:11.0)+Gecko/20100101+Firefox/11.0',
    'Mozilla/4.0+(compatible;+MSIE+8.0;+Windows+NT+5.1;+Trident/4.0;+SV1)',
    'Mozilla/4.0+(compatible;+MSIE+8.0;+Windows+NT+5.1;+Trident/4.0;+GTB7.1;+.NET+CLR+2.0.50727)',
    'Mozilla/4.0+(compatible;+MSIE+8.0;+Windows+NT+5.1;+Trident/4.0;+KB974489)']
    headers = {'User-Agent': random.choice(uaList)}

r = requests.get(baiduURL, headers = headers)
return r.content

def getLastURL(rawurl): #get final URL while there're redirects
r = requests.get(rawurl)
return r.url

def getAtext(atext): #get the text with <a> and </a>
 pat = re.compile(r'<a .*?>(.*?)</a>')
 match = pat.findall(atext.replace('\n', ''))
 pureText = match[0].replace('', '').replace('', '')
 return pureText.replace('\n', '')

def getCacheDate(t): #get the date of cache
 pat = re.compile(r'.*?(\d{4}-\d{1,2}-\d{1,2}) ')
 match = pat.findall(t)
 cacheDate = match[0]
 return cacheDate

def getRank(checkWord, domain): #main line
    checkWord = checkWord.replace('\n', '')
    checkWord = decodeAnyWord(checkWord)
    baiduURL = createURL(checkWord)
    cont = getContent(baiduURL)
    soup = BeautifulSoup.BeautifulSoup(cont)
    results = soup.findAll('table', {'class': 'result'})    #find all results in this page

for result in results:
 checkData = unicode(result.find('span', {'class': 'g'}))
 if re.compile(r'^[^/]*%s.*?' %domain).match(checkData.replace('', '').replace('', '')): #改正则
 nowRank = result['id'] #get the rank if match the domain info

            resLink = result.find('h3').a
            resURL = resLink['href']
            domainURL = getLastURL(resURL) #get the target URL
            resTitle = getAtext(unicode(resLink))   #get the title of the target page

rescache = result.find('span', {'class': 'g'})
cacheDate = getCacheDate(unicode(rescache)) #get the cache date of the target page

            res = u'%s, 第%s名, %s, %s, %s' % (checkWord, nowRank, resTitle, cacheDate, domainURL)
            return res.encode('gb2312')
            break
    else:
        return '>100'

domain = 'www.baidu.com' #set the domain which you want to search.
print getRank('百度', domain)

您可能感兴趣的文章:

一文教会你用Python获取网页指定内容
Python用做数据处理还是相当不错的,如果你想要做爬虫,Python是很好的选择,它有很多已经写好的类包,只要调用即可完成很多复杂的功能,下面这篇文章主要给大家介绍了关于Python获取网页指定内容的相关资料,需要的朋友可以参考下
2022-03-03
使用django自带的user做外键的方法
这篇文章主要介绍了使用django自带的user做外键的方法，文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值，需要的朋友们下面随着小编来一起学习学习吧
2020-11-11
python机器学习XGBoost梯度提升决策树的高效且可扩展实现
这篇文章主要为大家介绍了python机器学习XGBoost梯度提升决策树的高效且可扩展实现,有需要的朋友可以借鉴参考下,希望能够有所帮助,祝大家多多进步,早日升职加薪
2024-01-01
Python进行文件路径处理的模块详解
我相信很多人和小编一样,从来没有好好研究过Python的文件路径,Python作为一个magical语言,肯定有其他更直观的文件路径处理方法,下面就跟随小编一起了解一下吧
2025-04-04
python的virtualenv虚拟环境常见问题和命令
在Python中,venv是一个用于创建和管理虚拟环境的模块,虚拟环境可以帮助你在项目之间隔离不同的Python包和依赖关系,这篇文章主要介绍了python的virtualenv虚拟环境常见问题和命令,需要的朋友可以参考下
2024-07-07
Python中Async语法协程的实现
这篇文章主要介绍了Python中Async语法协程的实现，文章围绕主题展开详细的内容介绍，具有一定的参考价值，需要的小伙伴可以参考一下
2022-06-06
详解Python字符串切片
这篇文章主要介绍了Python字符串切片，文中通过示例代码介绍的非常详细，对大家的学习或者工作具有一定的参考学习价值，需要的朋友们下面随着小编来一起学习学习吧
2019-05-05
Python数据可视化实践之使用Matplotlib绘制图表
数据可视化是数据分析的重要环节，通过将数据转化为图形，可以更直观地展示数据特征和规律。Python中的Matplotlib库是一个强大的数据可视化工具，本文将带您了解Matplotlib的基本使用方法，以及如何绘制常见的图表
2023-05-05
如何解决pycharm调试报错的问题
在本篇内容里小编给大家整理的是一篇关于如何解决pycharm调试报错的问题文章，需要的朋友们可以学习参考下。
2020-08-08
python django生成迁移文件的实例
今天小编就为大家分享一篇python django生成迁移文件的实例，具有很好的参考价值，希望对大家有所帮助。一起跟随小编过来看看吧
2019-08-08

python实现百度关键词排名查询

相关文章

最新评论

大家感兴趣的内容

最近更新的内容

常用在线小工具