Python-网站页面代码获取

内容预览:
  • Python3.6 库:urllib3, bs4 主程序是抓取亚马逊图书销售排名数据,但是...~
  • 其实简单的页面抓取,常用的urllib.request就能实现,但是urllib3功能更...~
  • 首先导入模块: import urllib3, bs4 定义要访问的页面: urltest = 'ht...~

Python3.6

库:urllib3, bs4

主程序是抓取亚马逊图书销售排名数据,但是亚马逊应该是加了反爬虫,拒绝疑似机器人的请求,这部分暂时以百度代替。

其实简单的页面抓取,常用的urllib.request就能实现,但是urllib3功能更多,应用前景更广,需要学习。

首先导入模块:

import urllib3, bs4

定义要访问的页面:

urltest = 'https://www.baidu.com'

定义函数,这里对比两种解码方法:

def httpget():
http
= urllib3.PoolManager() #首先产生一个PoolManager实例
urllib3.disable_warnings() #忽略https的无效证书警报
page = http.request('GET','%s'%urltest) #发起GET请求
print(page.status) #服务器返回的代码
print(page.data) #服务器返回的数据,返回的是xml字符串
print(page.data.decode()) #利用默认'utf-8'编码格式去解码
res = bs4.BeautifulSoup(page.data,'lxml') #利用lxml模块解码
print(res)
return None

 

执行函数httpget()输出结果:

200
b'
<!DOCTYPE html><!--STATUS OK-->rn<html>rn<head>rnt<meta http-equiv="content-type" content="text/html;charset=utf-8">rnt<meta http-equiv="X-UA-Compatible" content="IE=Edge">rnt<link rel="dns-prefetch" href="//s1.bdstatic.com"/>rnt<link rel="dns-prefetch" href="//t1.baidu.com"/>rnt<link rel="dns-prefetch" href="//t2.baidu.com"/>rnt<link rel="dns-prefetch" href="//t3.baidu.com"/>rnt<link rel="dns-prefetch" href="//t10.baidu.com"/>rnt<link rel="dns-prefetch" href="//t11.baidu.com"/>rnt<link rel="dns-prefetch" href="//t12.baidu.com"/>rnt<link rel="dns-prefetch" href="//b1.bdstatic.com"/>rnt<title>xe7x99xbexe5xbaxa6xe4xb8x80xe4xb8x8bxefxbcx8cxe4xbdxa0xe5xb0xb1xe7x9fxa5xe9x81x93</title>rnt<link href="https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/home/css/index.css" rel="stylesheet" type="text/css" />rnt<!--[if lte IE 8]><style index="index" >#content#m</style><![endif]-->rnt<!--[if IE 8]><style index="index" >#u1 a.mnav,#u1 a.mnav:visited</style><![endif]-->rnt<script>var hashMatch = document.location.href.match(/#+(.*wd=[^&].+)/);if (hashMatch && hashMatch[0] && hashMatch[1]) var ns_c = function(){};</script>rnt<script>function h(obj)</script>rnt<noscript><meta http-equiv="refresh" content="0; url=/baidu.html?from=noscript"/></noscript>rnt<script>window._ASYNC_START=new Date().getTime();</script>rn</head>rn<body link="#0000cc"><div id="wrapper" style="display:none;"><div id="u"><a href="//www.baidu.com/gaoji/preferences.html" onmousedown="return user_c({'fm':'set','tab':'setting','login':'0'})">xe6x90x9cxe7xb4xa2xe8xaexbexe7xbdxae</a>|<a id="btop" href="/" onmousedown="return user_c({'fm':'set','tab':'index','login':'0'})">xe7x99xbexe5xbaxa6xe9xa6x96xe9xa1xb5</a>|<a id="lb" href="https://passport.baidu.com/v2/?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2F" onclick="return false;" onmousedown="return user_c({'fm':'set','tab':'login'})">xe7x99xbbxe5xbdx95</a><a href="https://passport.baidu.com/v2/?reg&regType=1&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2F" onmousedown="return user_c({'fm':'set','tab':'reg'})" target="_blank" class="reg">xe6xb3xa8xe5x86x8c</a></div><div id="head"><div class="s_nav"><a href="/" class="s_logo" onmousedown="return c({'fm':'tab','tab':'logo'})"><img src="//www.baidu.com/img/baidu_jgylogo3.gif" width="117" height="38" border="0" alt="xe5x88xb0xe7x99xbexe5xbaxa6xe9xa6x96xe9xa1xb5" title="xe5x88xb0xe7x99xbexe5xbaxa6xe9xa6x96xe9xa1xb5"></a><div class="s_tab" id="s_tab"><a href="http://news.baidu.com/ns?cl=2&rn=20&tn=news&word=" wdfield="word" onmousedown="return c({'fm':'tab','tab':'news'})">xe6x96xb0xe9x97xbb</a>&#12288;<b>xe7xbdx91xe9xa1xb5</b>&#12288;<a href="http://tieba.baidu.com/f?kw=&fr=wwwt" wdfield="kw" onmousedown="return c({'fm':'tab','tab':'tieba'})">xe8xb4xb4xe5x90xa7</a>&#12288;<a href="http://zhidao.baidu.com/q?ct=17&pn=0&tn=ikaslist&rn=10&word=&fr=wwwt" wdfield="word" onmousedown="return c({'fm':'tab','tab':'zhidao'})">xe7x9fxa5xe9x81x93</a>&#12288;<a href="http://music.baidu.com/search?fr=ps&key=" wdfield="key" onmousedown="return c({'fm':'tab','tab':'music'})">xe9x9fxb3xe4xb9x90</a>&#12288;<a href="http://image.baidu.com/i?tn=baiduimage&ps=1&ct=201326592&lm=-1&cl=2&nc=1&word=" wdfield="word" onmousedown="return c({'fm':'tab','tab':'pic'})">xe5x9bxbexe7x89x87</a>&#12288;<a href="http://v.baidu.com/v?ct=301989888&rn=20&pn=0&db=0&s=25&word=" wdfield="word" onmousedown="return c({'fm':'tab','tab':'video'})">xe8xa7x86xe9xa2x91</a>&#12288;<a href="http://map.baidu.com/m?word=&fr=ps01000" wdfield="word" onmousedown="return c({'fm':'tab','tab':'map'})">xe5x9cxb0xe5x9bxbe</a>&#12288;<a href="http://wenku.baidu.com/search?word=&lm=0&od=0" wdfield="word" onmousedown="return c({'fm':'tab','tab':'wenku'})">xe6x96x87xe5xbax93</a>&#12288;<a href="//www.baidu.com/more/" onmousedown="return c({'fm':'tab','tab':'more'})">xe6x9bxb4xe5xa4x9axc2xbb</a></div></div><form id="form" name="f" action="/s" class="fm" ><input type="hidden" name="ie" value="utf-8"><input type="hidden" name="f" value="8"><input type="hidden" name="rsv_bp" value="1"><span class="bg s_ipt_wr"><input name="wd" id="kw" class="s_ipt" value="" maxlength="100"></span><span class="bg s_btn_wr"><input type="submit" id="su" value="xe7x99xbexe5xbaxa6xe4xb8x80xe4xb8x8b" class="bg s_btn" onmousedown="this.className='bg s_btn s_btn_h'" onmouseout="this.className='bg s_btn'"></span><span class="tools"><span id="mHolder"><div id="mCon"><span>xe8xbex93xe5x85xa5xe6xb3x95</span></div><ul id="mMenu"><li><a href="javascript:;" name="ime_hw">xe6x89x8bxe5x86x99</a></li><li><a href="javascript:;" name="ime_py">xe6x8bxbcxe9x9fxb3</a></li><li class="ln"></li><li><a href="javascript:;" name="ime_cl">xe5x85xb3xe9x97xad</a></li></ul></span><span class="shouji"><strong>xe6x8exa8xe8x8dx90&nbsp;:&nbsp;</strong><a href="http://w.x.baidu.com/go/mini/8/10000020" onmousedown="return ns_c({'fm':'behs','tab':'bdbrowser'})">xe7x99xbexe5xbaxa6xe6xb5x8fxe8xa7x88xe5x99xa8xefxbcx8cxe6x89x93xe5xbcx80xe7xbdx91xe9xa1xb5xe5xbfxab2xe7xa7x92xefxbcx81</a></span></span></form></div><div id="content"><div id="u1"><a href="http://news.baidu.com" name="tj_trnews" class="mnav">xe6x96xb0xe9x97xbb</a><a href="http://www.hao123.com" name="tj_trhao123" class="mnav">hao123</a><a href="http://map.baidu.com" name="tj_trmap" class="mnav">xe5x9cxb0xe5x9bxbe</a><a href="http://v.baidu.com" name="tj_trvideo" class="mnav">xe8xa7x86xe9xa2x91</a><a href="http://tieba.baidu.com" name="tj_trtieba" class="mnav">xe8xb4xb4xe5x90xa7</a><a href="https://passport.baidu.com/v2/?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2F" name="tj_login" id="lb" onclick="return false;">xe7x99xbbxe5xbdx95</a><a href="//www.baidu.com/gaoji/preferences.html" name="tj_settingicon" id="pf">xe8xaexbexe7xbdxae</a><a href="//www.baidu.com/more/" name="tj_briicon" id="bri">xe6x9bxb4xe5xa4x9axe4xbaxa7xe5x93x81</a></div><div id="m"><p id="lg"><img src="//www.baidu.com/img/bd_logo.png" width="270" height="129"></p><p id="nv"><a href="http://news.baidu.com">xe6x96xb0&nbsp;xe9x97xbb</a>xe3x80x80<b>xe7xbdx91&nbsp;xe9xa1xb5</b>xe3x80x80<a href="http://tieba.baidu.com">xe8xb4xb4&nbsp;xe5x90xa7</a>xe3x80x80<a href="http://zhidao.baidu.com">xe7x9fxa5&nbsp;xe9x81x93</a>xe3x80x80<a href="http://music.baidu.com">xe9x9fxb3&nbsp;xe4xb9x90</a>xe3x80x80<a href="http://image.baidu.com">xe5x9bxbe&nbsp;xe7x89x87</a>xe3x80x80<a href="http://v.baidu.com">xe8xa7x86&nbsp;xe9xa2x91</a>xe3x80x80<a href="http://map.baidu.com">xe5x9cxb0&nbsp;xe5x9bxbe</a></p><div id="fm"><form id="form1" name="f1" action="/s" class="fm"><span class="bg s_ipt_wr"><input type="text" name="wd" id="kw1" maxlength="100" class="s_ipt"></span><input type="hidden" name="rsv_bp" value="0"><input type=hidden name=ch value=""><input type=hidden name=tn value="baidu"><input type=hidden name=bar value=""><input type="hidden" name="rsv_spt" value="3"><input type="hidden" name="ie" value="utf-8"><span class="bg s_btn_wr"><input type="submit" value="xe7x99xbexe5xbaxa6xe4xb8x80xe4xb8x8b" id="su1" class="bg s_btn" onmousedown="this.className='bg s_btn s_btn_h'" onmouseout="this.className='bg s_btn'"></span></form><span class="tools"><span id="mHolder1"><div id="mCon1"><span>xe8xbex93xe5x85xa5xe6xb3x95</span></div></span></span><ul id="mMenu1"><div class="mMenu1-tip-arrow"><em></em><ins></ins></div><li><a href="javascript:;" name="ime_hw">xe6x89x8bxe5x86x99</a></li><li><a href="javascript:;" name="ime_py">xe6x8bxbcxe9x9fxb3</a></li><li class="ln"></li><li><a href="javascript:;" name="ime_cl">xe5x85xb3xe9x97xad</a></li></ul></div><p id="lk"><a href="http://baike.baidu.com">xe7x99xbexe7xa7x91</a>xe3x80x80<a href="http://wenku.baidu.com">xe6x96x87xe5xbax93</a>xe3x80x80<a href="http://www.hao123.com">hao123</a><span>&nbsp;|&nbsp;<a href="//www.baidu.com/more/">xe6x9bxb4xe5xa4x9a&gt;&gt;</a></span></p><p id="lm"></p></div></div><div id="ftCon"><div id="ftConw"><p id="lh"><a id="seth" onClick="h(this)" href="/" onmousedown="return ns_c({'fm':'behs','tab':'homepage','pos':0})">xe6x8ax8axe7x99xbexe5xbaxa6xe8xaexbexe4xb8xbaxe4xb8xbbxe9xa1xb5</a><a id="setf" href="//www.baidu.com/cache/sethelp/index.html" onmousedown="return ns_c({'fm':'behs','tab':'favorites','pos':0})" target="_blank">xe6x8ax8axe7x99xbexe5xbaxa6xe8xaexbexe4xb8xbaxe4xb8xbbxe9xa1xb5</a><a onmousedown="return ns_c({'fm':'behs','tab':'tj_about'})" href="http://home.baidu.com">xe5x85xb3xe4xbax8exe7x99xbexe5xbaxa6</a><a onmousedown="return ns_c({'fm':'behs','tab':'tj_about_en'})" href="http://ir.baidu.com">About Baidu</a></p><p id="cp">&copy;2018&nbsp;Baidu&nbsp;<a href="/duty/" name="tj_duty">xe4xbdxbfxe7x94xa8xe7x99xbexe5xbaxa6xe5x89x8dxe5xbfx85xe8xafxbb</a>&nbsp;xe4xbaxacICPxe8xafx81030173xe5x8fxb7&nbsp;<img src="http://s1.bdstatic.com/r/www/cache/static/global/img/gs_237f015b.gif"></p></div></div><div id="wrapper_wrapper"></div></div><div class="c-tips-container" id="c-tips-container"></div>rn<script>window.__async_strategy=2;</script>rn<script>var bds=,su:},util:{},use:{},comm : ,_base64:};var name,navigate,al_arr=[];var selfOpen = window.open;eval("var open = selfOpen;");var isIE=navigator.userAgent.indexOf("MSIE")!=-1&&!window.opera;var E = bds.ecom= {};bds.se.mon = {'loadedItems':[],'load':function(){},'srvt':-1};try catch(e){}</script>rn<script>if(!location.hash.match(/[^a-zA-Z0-9]wd=/))catch(e){}},0);}</script>rn<script type="text/javascript" src="https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/protocol/https/jquery/jquery-1.10.2.min_f2fb5194.js"></script>rn<script>(function()setTimeout(function()catch(e){}},0);if(typeof initIndex=='function')};window.index_off=function();})();</script>rn<script>window.__switch_add_mask=1;</script>rn<script type="text/javascript" src="https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/global/js/instant_search_newi_redirect1_20bf4036.js"></script>rn<script>initPreload();$("#u,#u1").delegate("#lb",'click',function()catch(e){}});if(navigator.cookieEnabled)</script>rn<script>$(function()function u(iptwr,ipt,btnwr,btn)).on('mouseout',function()).on('click',function());ipt.on('focus',function()).on('blur',function()).on('render',function(e)else{$s.removeClass('bdsugbg');}});}if(btnwr && btn)).on('mouseout',function());}}});</script>rn<script type="text/javascript" src="https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/home/js/bri_7f1fa703.js"></script>rn<script>(function()_init=true;var w=window,d=document,n=navigator,k=d.f1.wd,a=d.getElementById("nv").getElementsByTagName("a"),isIE=n.userAgent.indexOf("MSIE")!=-1&&!window.opera;(function()})();(function());}});}for(var i = 0; i < u.length; i++)for(var i = 0; i < nv.length; i++)for(var i = 0; i < lk.length; i++)})();(function() ;var domArr = [G('nv'), G('lk'),G('cp')],kw = G('kw1');for (var i = 0, l = domArr.length; i < l; i++) else }name && ns_c({'fm': 'behs','tab': name,'query': encodeURIComponent(key),'un': encodeURIComponent(bds.comm.user || '') });};}})();};if(window.pageState==0)})();document.cookie = 'IS_STATIC=1;expires=' + new Date(new Date().getTime() + 10*60*1000).toGMTString();</script>rn</body></html>rn'
<!DOCTYPE html><!--STATUS OK-->
<html>
<head>
<meta http-equiv="content-type" content="text/html;charset=utf-8">
<meta http-equiv="X-UA-Compatible" content="IE=Edge">
<link rel="dns-prefetch" href="//s1.bdstatic.com"/>
<link rel="dns-prefetch" href="//t1.baidu.com"/>
<link rel="dns-prefetch" href="//t2.baidu.com"/>
<link rel="dns-prefetch" href="//t3.baidu.com"/>
<link rel="dns-prefetch" href="//t10.baidu.com"/>
<link rel="dns-prefetch" href="//t11.baidu.com"/>
<link rel="dns-prefetch" href="//t12.baidu.com"/>
<link rel="dns-prefetch" href="//b1.bdstatic.com"/>
<title>百度一下,你就知道</title>
<link href="https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/home/css/index.css" rel="stylesheet" type="text/css" />
<!--[if lte IE 8]><style index="index" >#content#m</style><![endif]-->
<!--[if IE 8]><style index="index" >#u1 a.mnav,#u1 a.mnav:visited</style><![endif]-->
<script>var hashMatch = document.location.href.match(/#+(.*wd=[^&].+)/);if (hashMatch && hashMatch[0] && hashMatch[1]) var ns_c = function(){};</script>
<script>function h(obj)</script>
<noscript><meta http-equiv="refresh" content="0; url=/baidu.html?from=noscript"/></noscript>
<script>window._ASYNC_START=new Date().getTime();</script>
</head>
<body link="#0000cc"><div id="wrapper" style="display:none;"><div id="u"><a href="//www.baidu.com/gaoji/preferences.html" onmousedown="return user_c({'fm':'set','tab':'setting','login':'0'})">搜索设置</a>|<a id="btop" href="/" onmousedown="return user_c({'fm':'set','tab':'index','login':'0'})">百度首页</a>|<a id="lb" href="https://passport.baidu.com/v2/?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2F" onclick="return false;" onmousedown="return user_c({'fm':'set','tab':'login'})">登录</a><a href="https://passport.baidu.com/v2/?reg&regType=1&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2F" onmousedown="return user_c({'fm':'set','tab':'reg'})" target="_blank" class="reg">注册</a></div><div id="head"><div class="s_nav"><a href="/" class="s_logo" onmousedown="return c({'fm':'tab','tab':'logo'})"><img src="//www.baidu.com/img/baidu_jgylogo3.gif" width="117" height="38" border="0" alt="到百度首页" title="到百度首页"></a><div class="s_tab" id="s_tab"><a href="http://news.baidu.com/ns?cl=2&rn=20&tn=news&word=" wdfield="word" onmousedown="return c({'fm':'tab','tab':'news'})">新闻</a>&#12288;<b>网页</b>&#12288;<a href="http://tieba.baidu.com/f?kw=&fr=wwwt" wdfield="kw" onmousedown="return c({'fm':'tab','tab':'tieba'})">贴吧</a>&#12288;<a href="http://zhidao.baidu.com/q?ct=17&pn=0&tn=ikaslist&rn=10&word=&fr=wwwt" wdfield="word" onmousedown="return c({'fm':'tab','tab':'zhidao'})">知道</a>&#12288;<a href="http://music.baidu.com/search?fr=ps&key=" wdfield="key" onmousedown="return c({'fm':'tab','tab':'music'})">音乐</a>&#12288;<a href="http://image.baidu.com/i?tn=baiduimage&ps=1&ct=201326592&lm=-1&cl=2&nc=1&word=" wdfield="word" onmousedown="return c({'fm':'tab','tab':'pic'})">图片</a>&#12288;<a href="http://v.baidu.com/v?ct=301989888&rn=20&pn=0&db=0&s=25&word=" wdfield="word" onmousedown="return c({'fm':'tab','tab':'video'})">视频</a>&#12288;<a href="http://map.baidu.com/m?word=&fr=ps01000" wdfield="word" onmousedown="return c({'fm':'tab','tab':'map'})">地图</a>&#12288;<a href="http://wenku.baidu.com/search?word=&lm=0&od=0" wdfield="word" onmousedown="return c({'fm':'tab','tab':'wenku'})">文库</a>&#12288;<a href="//www.baidu.com/more/" onmousedown="return c({'fm':'tab','tab':'more'})">更多»</a></div></div><form id="form" name="f" action="/s" class="fm" ><input type="hidden" name="ie" value="utf-8"><input type="hidden" name="f" value="8"><input type="hidden" name="rsv_bp" value="1"><span class="bg s_ipt_wr"><input name="wd" id="kw" class="s_ipt" value="" maxlength="100"></span><span class="bg s_btn_wr"><input type="submit" id="su" value="百度一下" class="bg s_btn" onmousedown="this.className='bg s_btn s_btn_h'" onmouseout="this.className='bg s_btn'"></span><span class="tools"><span id="mHolder"><div id="mCon"><span>输入法</span></div><ul id="mMenu"><li><a href="javascript:;" name="ime_hw">手写</a></li><li><a href="javascript:;" name="ime_py">拼音</a></li><li class="ln"></li><li><a href="javascript:;" name="ime_cl">关闭</a></li></ul></span><span class="shouji"><strong>推荐&nbsp;:&nbsp;</strong><a href="http://w.x.baidu.com/go/mini/8/10000020" onmousedown="return ns_c({'fm':'behs','tab':'bdbrowser'})">百度浏览器,打开网页快2秒!</a></span></span></form></div><div id="content"><div id="u1"><a href="http://news.baidu.com" name="tj_trnews" class="mnav">新闻</a><a href="http://www.hao123.com" name="tj_trhao123" class="mnav">hao123</a><a href="http://map.baidu.com" name="tj_trmap" class="mnav">地图</a><a href="http://v.baidu.com" name="tj_trvideo" class="mnav">视频</a><a href="http://tieba.baidu.com" name="tj_trtieba" class="mnav">贴吧</a><a href="https://passport.baidu.com/v2/?login&tpl=mn&u=http%3A%2F%2Fwww.baidu.com%2F" name="tj_login" id="lb" onclick="return false;">登录</a><a href="//www.baidu.com/gaoji/preferences.html" name="tj_settingicon" id="pf">设置</a><a href="//www.baidu.com/more/" name="tj_briicon" id="bri">更多产品</a></div><div id="m"><p id="lg"><img src="//www.baidu.com/img/bd_logo.png" width="270" height="129"></p><p id="nv"><a href="http://news.baidu.com">&nbsp;</a> <b>&nbsp;</b> <a href="http://tieba.baidu.com">&nbsp;</a> <a href="http://zhidao.baidu.com">&nbsp;</a> <a href="http://music.baidu.com">&nbsp;</a> <a href="http://image.baidu.com">&nbsp;</a> <a href="http://v.baidu.com">&nbsp;</a> <a href="http://map.baidu.com">&nbsp;</a></p><div id="fm"><form id="form1" name="f1" action="/s" class="fm"><span class="bg s_ipt_wr"><input type="text" name="wd" id="kw1" maxlength="100" class="s_ipt"></span><input type="hidden" name="rsv_bp" value="0"><input type=hidden name=ch value=""><input type=hidden name=tn value="baidu"><input type=hidden name=bar value=""><input type="hidden" name="rsv_spt" value="3"><input type="hidden" name="ie" value="utf-8"><span class="bg s_btn_wr"><input type="submit" value="百度一下" id="su1" class="bg s_btn" onmousedown="this.className='bg s_btn s_btn_h'" onmouseout="this.className='bg s_btn'"></span></form><span class="tools"><span id="mHolder1"><div id="mCon1"><span>输入法</span></div></span></span><ul id="mMenu1"><div class="mMenu1-tip-arrow"><em></em><ins></ins></div><li><a href="javascript:;" name="ime_hw">手写</a></li><li><a href="javascript:;" name="ime_py">拼音</a></li><li class="ln"></li><li><a href="javascript:;" name="ime_cl">关闭</a></li></ul></div><p id="lk"><a href="http://baike.baidu.com">百科</a> <a href="http://wenku.baidu.com">文库</a> <a href="http://www.hao123.com">hao123</a><span>&nbsp;|&nbsp;<a href="//www.baidu.com/more/">更多&gt;&gt;</a></span></p><p id="lm"></p></div></div><div id="ftCon"><div id="ftConw"><p id="lh"><a id="seth" onClick="h(this)" href="/" onmousedown="return ns_c({'fm':'behs','tab':'homepage','pos':0})">把百度设为主页</a><a id="setf" href="//www.baidu.com/cache/sethelp/index.html" onmousedown="return ns_c({'fm':'behs','tab':'favorites','pos':0})" target="_blank">把百度设为主页</a><a onmousedown="return ns_c({'fm':'behs','tab':'tj_about'})" href="http://home.baidu.com">关于百度</a><a onmousedown="return ns_c({'fm':'behs','tab':'tj_about_en'})" href="http://ir.baidu.com">About Baidu</a></p><p id="cp">&copy;2018&nbsp;Baidu&nbsp;<a href="/duty/" name="tj_duty">使用百度前必读</a>&nbsp;京ICP证030173号&nbsp;<img src="http://s1.bdstatic.com/r/www/cache/static/global/img/gs_237f015b.gif"></p></div></div><div id="wrapper_wrapper"></div></div><div class="c-tips-container" id="c-tips-container"></div>
<script>window.__async_strategy=2;</script>
<script>var bds=,su:},util:{},use:{},comm : ,_base64:};var name,navigate,al_arr=[];var selfOpen = window.open;eval("var open = selfOpen;");var isIE=navigator.userAgent.indexOf("MSIE")!=-1&&!window.opera;var E = bds.ecom= {};bds.se.mon = {'loadedItems':[],'load':function(){},'srvt':-1};try catch(e){}</script>
<script>if(!location.hash.match(/[^a-zA-Z0-9]wd=/))catch(e){}},0);}</script>
<script type="text/javascript" src="https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/protocol/https/jquery/jquery-1.10.2.min_f2fb5194.js"></script>
<script>(function(){var index_content = $('#content');var index_foot= $('#ftCon');var index_css= $('head [index]');var index_u= $('#u1');var result_u= $('#u');var wrapper=$("#wrapper");window.index_on=function()setTimeout(function(){try{$('#kw1').get(0).focus();window.sugIndex.start();}catch(e){}},0);if(typeof initIndex=='function')};window.index_off=function();})();</script>
<script>window.__switch_add_mask=1;</script>
<script type="text/javascript" src="https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/global/js/instant_search_newi_redirect1_20bf4036.js"></script>
<script>initPreload();$("#u,#u1").delegate("#lb",'click',function(){trycatch(e){}});if(navigator.cookieEnabled)</script>
<script>$(function(){for(i=0;i<3;i++)function u(iptwr,ipt,btnwr,btn){if(iptwr && ipt)).on('mouseout',function()).on('click',function());ipt.on('focus',function()).on('blur',function()).on('render',function(e){var $s = iptwr.parent().find('.bdsug');var l = $s.find('li').length;if(l>=5){$s.addClass('bdsugbg');}else{$s.removeClass('bdsugbg');}});}if(btnwr && btn)).on('mouseout',function());}}});</script>
<script type="text/javascript" src="https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/home/js/bri_7f1fa703.js"></script>
<script>(function(){var _init=false;window.initIndex=function(){if(_init){return;}_init=true;var w=window,d=document,n=navigator,k=d.f1.wd,a=d.getElementById("nv").getElementsByTagName("a"),isIE=n.userAgent.indexOf("MSIE")!=-1&&!window.opera;(function(){if(/q=([^&]+)/.test(location.search))})();(function(){var u = G("u1").getElementsByTagName("a"), nv = G("nv").getElementsByTagName("a"), lk = G("lk").getElementsByTagName("a"), un = "";var tj_nv = ["news","tieba","zhidao","mp3","img","video","map"];var tj_lk = ["baike","wenku","hao123","more"];un = bds.comm.user == "" ? "" : bds.comm.user;function _addTJ(obj));}});}for(var i = 0; i < u.length; i++)for(var i = 0; i < nv.length; i++)for(var i = 0; i < lk.length; i++)})();(function() {var links = {'tj_news': ['word', 'http://news.baidu.com/ns?tn=news&cl=2&rn=20&ct=1&ie=utf-8'],'tj_tieba': ['kw', 'http://tieba.baidu.com/f?ie=utf-8'],'tj_zhidao': ['word', 'http://zhidao.baidu.com/search?pn=0&rn=10&lm=0'],'tj_mp3': ['key', 'http://music.baidu.com/search?fr=ps&ie=utf-8'],'tj_img': ['word', 'http://image.baidu.com/i?ct=201326592&cl=2&nc=1&lm=-1&st=-1&tn=baiduimage&istype=2&fm=&pv=&z=0&ie=utf-8'],'tj_video': ['word', 'http://video.baidu.com/v?ct=301989888&s=25&ie=utf-8'],'tj_map': ['wd', 'http://map.baidu.com/?newmap=1&ie=utf-8&s=s'],'tj_baike': ['word', 'http://baike.baidu.com/search/word?pic=1&sug=1&enc=utf8'],'tj_wenku': ['word', 'http://wenku.baidu.com/search?ie=utf-8']};var domArr = [G('nv'), G('lk'),G('cp')],kw = G('kw1');for (var i = 0, l = domArr.length; i < l; i++) else }name && ns_c({'fm': 'behs','tab': name,'query': encodeURIComponent(key),'un': encodeURIComponent(bds.comm.user || '') });};}})();};if(window.pageState==0)})();document.cookie = 'IS_STATIC=1;expires=' + new Date(new Date().getTime() + 10*60*1000).toGMTString();</script>
</body></html>

<!DOCTYPE html>
<!--STATUS OK--><html>
<head>
<meta content="text/html;charset=utf-8" http-equiv="content-type"/>
<meta content="IE=Edge" http-equiv="X-UA-Compatible"/>
<link href="//s1.bdstatic.com" rel="dns-prefetch"/>
<link href="//t1.baidu.com" rel="dns-prefetch"/>
<link href="//t2.baidu.com" rel="dns-prefetch"/>
<link href="//t3.baidu.com" rel="dns-prefetch"/>
<link href="//t10.baidu.com" rel="dns-prefetch"/>
<link href="//t11.baidu.com" rel="dns-prefetch"/>
<link href="//t12.baidu.com" rel="dns-prefetch"/>
<link href="//b1.bdstatic.com" rel="dns-prefetch"/>
<title>百度一下,你就知道</title>
<link href="https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/home/css/index.css" rel="stylesheet" type="text/css"/>
<!--[if lte IE 8]><style index="index" >#content#m</style><![endif]-->
<!--[if IE 8]><style index="index" >#u1 a.mnav,#u1 a.mnav:visited</style><![endif]-->
<script>var hashMatch = document.location.href.match(/#+(.*wd=[^&].+)/);if (hashMatch && hashMatch[0] && hashMatch[1]) var ns_c = function(){};</script>
<script>function h(obj)</script>
<noscript><meta content="0; url=/baidu.html?from=noscript" http-equiv="refresh"/></noscript>
<script>window._ASYNC_START=new Date().getTime();</script>
</head>
<body link="#0000cc"><div id="wrapper" style="display:none;"><div id="u"><a href="//www.baidu.com/gaoji/preferences.html" onmousedown="return user_c({'fm':'set','tab':'setting','login':'0'})">搜索设置</a>|<a href="/" id="btop" onmousedown="return user_c({'fm':'set','tab':'index','login':'0'})">百度首页</a>|<a href="https://passport.baidu.com/v2/?login&amp;tpl=mn&amp;u=http%3A%2F%2Fwww.baidu.com%2F" id="lb" onclick="return false;" onmousedown="return user_c({'fm':'set','tab':'login'})">登录</a><a class="reg" href="https://passport.baidu.com/v2/?reg&amp;regType=1&amp;tpl=mn&amp;u=http%3A%2F%2Fwww.baidu.com%2F" onmousedown="return user_c({'fm':'set','tab':'reg'})" target="_blank">注册</a></div><div id="head"><div class="s_nav"><a class="s_logo" href="/" onmousedown="return c({'fm':'tab','tab':'logo'})"><img alt="到百度首页" border="0" height="38" src="//www.baidu.com/img/baidu_jgylogo3.gif" title="到百度首页" width="117"/></a><div class="s_tab" id="s_tab"><a href="http://news.baidu.com/ns?cl=2&amp;rn=20&amp;tn=news&amp;word=" onmousedown="return c({'fm':'tab','tab':'news'})" wdfield="word">新闻</a> <b>网页</b> <a href="http://tieba.baidu.com/f?kw=&amp;fr=wwwt" onmousedown="return c({'fm':'tab','tab':'tieba'})" wdfield="kw">贴吧</a> <a href="http://zhidao.baidu.com/q?ct=17&amp;pn=0&amp;tn=ikaslist&amp;rn=10&amp;word=&amp;fr=wwwt" onmousedown="return c({'fm':'tab','tab':'zhidao'})" wdfield="word">知道</a> <a href="http://music.baidu.com/search?fr=ps&amp;key=" onmousedown="return c({'fm':'tab','tab':'music'})" wdfield="key">音乐</a> <a href="http://image.baidu.com/i?tn=baiduimage&amp;ps=1&amp;ct=201326592&amp;lm=-1&amp;cl=2&amp;nc=1&amp;word=" onmousedown="return c({'fm':'tab','tab':'pic'})" wdfield="word">图片</a> <a href="http://v.baidu.com/v?ct=301989888&amp;rn=20&amp;pn=0&amp;db=0&amp;s=25&amp;word=" onmousedown="return c({'fm':'tab','tab':'video'})" wdfield="word">视频</a> <a href="http://map.baidu.com/m?word=&amp;fr=ps01000" onmousedown="return c({'fm':'tab','tab':'map'})" wdfield="word">地图</a> <a href="http://wenku.baidu.com/search?word=&amp;lm=0&amp;od=0" onmousedown="return c({'fm':'tab','tab':'wenku'})" wdfield="word">文库</a> <a href="//www.baidu.com/more/" onmousedown="return c({'fm':'tab','tab':'more'})">更多»</a></div></div><form action="/s" class="fm" id="form" name="f"><input name="ie" type="hidden" value="utf-8"/><input name="f" type="hidden" value="8"/><input name="rsv_bp" type="hidden" value="1"/><span class="bg s_ipt_wr"><input class="s_ipt" id="kw" maxlength="100" name="wd" value=""/></span><span class="bg s_btn_wr"><input class="bg s_btn" id="su" onmousedown="this.className='bg s_btn s_btn_h'" onmouseout="this.className='bg s_btn'" type="submit" value="百度一下"/></span><span class="tools"><span id="mHolder"><div id="mCon"><span>输入法</span></div><ul id="mMenu"><li><a href="javascript:;" name="ime_hw">手写</a></li><li><a href="javascript:;" name="ime_py">拼音</a></li><li class="ln"></li><li><a href="javascript:;" name="ime_cl">关闭</a></li></ul></span><span class="shouji"><strong>推荐 : </strong><a href="http://w.x.baidu.com/go/mini/8/10000020" onmousedown="return ns_c({'fm':'behs','tab':'bdbrowser'})">百度浏览器,打开网页快2秒!</a></span></span></form></div><div id="content"><div id="u1"><a class="mnav" href="http://news.baidu.com" name="tj_trnews">新闻</a><a class="mnav" href="http://www.hao123.com" name="tj_trhao123">hao123</a><a class="mnav" href="http://map.baidu.com" name="tj_trmap">地图</a><a class="mnav" href="http://v.baidu.com" name="tj_trvideo">视频</a><a class="mnav" href="http://tieba.baidu.com" name="tj_trtieba">贴吧</a><a href="https://passport.baidu.com/v2/?login&amp;tpl=mn&amp;u=http%3A%2F%2Fwww.baidu.com%2F" id="lb" name="tj_login" onclick="return false;">登录</a><a href="//www.baidu.com/gaoji/preferences.html" id="pf" name="tj_settingicon">设置</a><a href="//www.baidu.com/more/" id="bri" name="tj_briicon">更多产品</a></div><div id="m"><p id="lg"><img height="129" src="//www.baidu.com/img/bd_logo.png" width="270"/></p><p id="nv"><a href="http://news.baidu.com">新 闻</a> <b>网 页</b> <a href="http://tieba.baidu.com">贴 吧</a> <a href="http://zhidao.baidu.com">知 道</a> <a href="http://music.baidu.com">音 乐</a> <a href="http://image.baidu.com">图 片</a> <a href="http://v.baidu.com">视 频</a> <a href="http://map.baidu.com">地 图</a></p><div id="fm"><form action="/s" class="fm" id="form1" name="f1"><span class="bg s_ipt_wr"><input class="s_ipt" id="kw1" maxlength="100" name="wd" type="text"/></span><input name="rsv_bp" type="hidden" value="0"/><input name="ch" type="hidden" value=""/><input name="tn" type="hidden" value="baidu"/><input name="bar" type="hidden" value=""/><input name="rsv_spt" type="hidden" value="3"/><input name="ie" type="hidden" value="utf-8"/><span class="bg s_btn_wr"><input class="bg s_btn" id="su1" onmousedown="this.className='bg s_btn s_btn_h'" onmouseout="this.className='bg s_btn'" type="submit" value="百度一下"/></span></form><span class="tools"><span id="mHolder1"><div id="mCon1"><span>输入法</span></div></span></span><ul id="mMenu1"><div class="mMenu1-tip-arrow"><em></em><ins></ins></div><li><a href="javascript:;" name="ime_hw">手写</a></li><li><a href="javascript:;" name="ime_py">拼音</a></li><li class="ln"></li><li><a href="javascript:;" name="ime_cl">关闭</a></li></ul></div><p id="lk"><a href="http://baike.baidu.com">百科</a> <a href="http://wenku.baidu.com">文库</a> <a href="http://www.hao123.com">hao123</a><span> | <a href="//www.baidu.com/more/">更多&gt;&gt;</a></span></p><p id="lm"></p></div></div><div id="ftCon"><div id="ftConw"><p id="lh"><a href="/" id="seth" onclick="h(this)" onmousedown="return ns_c({'fm':'behs','tab':'homepage','pos':0})">把百度设为主页</a><a href="//www.baidu.com/cache/sethelp/index.html" id="setf" onmousedown="return ns_c({'fm':'behs','tab':'favorites','pos':0})" target="_blank">把百度设为主页</a><a href="http://home.baidu.com" onmousedown="return ns_c({'fm':'behs','tab':'tj_about'})">关于百度</a><a href="http://ir.baidu.com" onmousedown="return ns_c({'fm':'behs','tab':'tj_about_en'})">About Baidu</a></p><p id="cp">©2018 Baidu <a href="/duty/" name="tj_duty">使用百度前必读</a> 京ICP证030173号 <img src="http://s1.bdstatic.com/r/www/cache/static/global/img/gs_237f015b.gif"/></p></div></div><div id="wrapper_wrapper"></div></div><div class="c-tips-container" id="c-tips-container"></div>
<script>window.__async_strategy=2;</script>
<script>var bds=,su:},util:{},use:{},comm : ,_base64:};var name,navigate,al_arr=[];var selfOpen = window.open;eval("var open = selfOpen;");var isIE=navigator.userAgent.indexOf("MSIE")!=-1&&!window.opera;var E = bds.ecom= {};bds.se.mon = {'loadedItems':[],'load':function(){},'srvt':-1};try catch(e){}</script>
<script>if(!location.hash.match(/[^a-zA-Z0-9]wd=/))catch(e){}},0);}</script>
<script src="https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/protocol/https/jquery/jquery-1.10.2.min_f2fb5194.js" type="text/javascript"></script>
<script>(function(){var index_content = $('#content');var index_foot= $('#ftCon');var index_css= $('head [index]');var index_u= $('#u1');var result_u= $('#u');var wrapper=$("#wrapper");window.index_on=function()setTimeout(function(){try{$('#kw1').get(0).focus();window.sugIndex.start();}catch(e){}},0);if(typeof initIndex=='function')};window.index_off=function();})();</script>
<script>window.__switch_add_mask=1;</script>
<script src="https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/global/js/instant_search_newi_redirect1_20bf4036.js" type="text/javascript"></script>
<script>initPreload();$("#u,#u1").delegate("#lb",'click',function(){trycatch(e){}});if(navigator.cookieEnabled)</script>
<script>$(function(){for(i=0;i<3;i++)function u(iptwr,ipt,btnwr,btn){if(iptwr && ipt)).on('mouseout',function()).on('click',function());ipt.on('focus',function()).on('blur',function()).on('render',function(e){var $s = iptwr.parent().find('.bdsug');var l = $s.find('li').length;if(l>=5){$s.addClass('bdsugbg');}else{$s.removeClass('bdsugbg');}});}if(btnwr && btn)).on('mouseout',function());}}});</script>
<script src="https://ss1.bdstatic.com/5eN1bjq8AAUYm2zgoY3K/r/www/cache/static/home/js/bri_7f1fa703.js" type="text/javascript"></script>
<script>(function(){var _init=false;window.initIndex=function(){if(_init){return;}_init=true;var w=window,d=document,n=navigator,k=d.f1.wd,a=d.getElementById("nv").getElementsByTagName("a"),isIE=n.userAgent.indexOf("MSIE")!=-1&&!window.opera;(function(){if(/q=([^&]+)/.test(location.search))})();(function(){var u = G("u1").getElementsByTagName("a"), nv = G("nv").getElementsByTagName("a"), lk = G("lk").getElementsByTagName("a"), un = "";var tj_nv = ["news","tieba","zhidao","mp3","img","video","map"];var tj_lk = ["baike","wenku","hao123","more"];un = bds.comm.user == "" ? "" : bds.comm.user;function _addTJ(obj));}});}for(var i = 0; i < u.length; i++)for(var i = 0; i < nv.length; i++)for(var i = 0; i < lk.length; i++)})();(function() {var links = {'tj_news': ['word', 'http://news.baidu.com/ns?tn=news&cl=2&rn=20&ct=1&ie=utf-8'],'tj_tieba': ['kw', 'http://tieba.baidu.com/f?ie=utf-8'],'tj_zhidao': ['word', 'http://zhidao.baidu.com/search?pn=0&rn=10&lm=0'],'tj_mp3': ['key', 'http://music.baidu.com/search?fr=ps&ie=utf-8'],'tj_img': ['word', 'http://image.baidu.com/i?ct=201326592&cl=2&nc=1&lm=-1&st=-1&tn=baiduimage&istype=2&fm=&pv=&z=0&ie=utf-8'],'tj_video': ['word', 'http://video.baidu.com/v?ct=301989888&s=25&ie=utf-8'],'tj_map': ['wd', 'http://map.baidu.com/?newmap=1&ie=utf-8&s=s'],'tj_baike': ['word', 'http://baike.baidu.com/search/word?pic=1&sug=1&enc=utf8'],'tj_wenku': ['word', 'http://wenku.baidu.com/search?ie=utf-8']};var domArr = [G('nv'), G('lk'),G('cp')],kw = G('kw1');for (var i = 0, l = domArr.length; i < l; i++) else }name && ns_c({'fm': 'behs','tab': name,'query': encodeURIComponent(key),'un': encodeURIComponent(bds.comm.user || '') });};}})();};if(window.pageState==0)})();document.cookie = 'IS_STATIC=1;expires=' + new Date(new Date().getTime() + 10*60*1000).toGMTString();</script>
</body></html>


Process finished with exit code 0

在这里两种解码方式都没出错,但是如果换成比较复杂的页面,普通的decode()方式就容易报错。

比如京东这个页面:

url = 'https://item.jd.com/6072622.html'

将urltest替换成url之后执行代码,执行结果如下:

UnicodeDecodeError: 'utf-8' codec can't decode byte 0xa1 in position 146: invalid start byte

 

以上就是:Python-网站页面代码获取 的全部内容。

本站部分内容来源于互联网和用户投稿,如有侵权请联系我们删除,谢谢。
Email:[email protected]


0 条回复 A 作者 M 管理员
    所有的伟大,都源于一个勇敢的开始!
欢迎您,新朋友,感谢参与互动!欢迎您 {{author}},您在本站有{{commentsCount}}条评论