Changeset Navigation

Changeset 276

Timestamp:

Wed Dec 21 22:02:46 2005

Author:

osmond

Message:

Files:

zh-translations/branches/diveintopython-zh-5.4/zh-cn/xml/openanything.xml (modified) (diff)

Legend:

: Unmodified
: Added
: Removed
: Modified

zh-translations/branches/diveintopython-zh-5.4/zh-cn/xml/openanything.xml

r201	r276
2	2	<chapter id="oa">
3	3	<?dbhtml filename="http_web_services/index.html"?>
4		<title>HTTP Web ~~Services~~</title>
	4	<title>HTTP Web 服务</title>
4	4	<titleabbrev id="oa.numberonly">Chapter 11</titleabbrev>
5	5	<section id="oa.divein">
6		<title>~~Diving in~~</title>
	6	<title>概览</title>
6	6	<abstract>
7	7	<title/>
8		<para>You've learned about <link linkend="dialect">HTML processing</link> and <link linkend="kgp">XML processing</link>, and along the way you saw <link linkend="dialect.extract.urllib">how to download a web page</link> and <link linkend="kgp.openanything.urllib">how to parse XML from a URL</link>, but let's dive into the more general topic of HTTP web services.</para>
	8	<para>你已经学习了关于 <link linkend="dialect">HTML 处理</link> 和 <link linkend="kgp">XML 处理</link> , 以及 <link linkend="dialect.extract.urllib">如何下载 web 页</link> and <link linkend="kgp.openanything.urllib">如何从 URL 解析 XML</link>, 接下来让我们来探讨有关 HTTP web 服务的更全面的主题。</para>
8	8	</abstract>
9		<para>Simply stated, HTTP web services are programmatic ways of sending and receiving data from remote servers using the operations of HTTP directly. If you want to get data from the server, use a straight HTTP GET; if you want to send new data to the server, use HTTP POST. (Some more advanced HTTP web service APIs also define ways of modifying existing data and deleting data, using HTTP PUT and HTTP DELETE.) In other words, the <quote>verbs</quote> built into the HTTP protocol (GET, POST, PUT, and DELETE) map directly to application-level operations for receiving, sending, modifying, and deleting data.</para>
10		<para>The main advantage of this approach is simplicity, and its simplicity has proven popular with a lot of different sites. Data -- usually XML data -- can be built and stored statically, or generated dynamically by a server-side script, and all major languages include an HTTP library for downloading it. Debugging is also easier, because you can load up the web service in any web browser and see the raw data. Modern browsers will even nicely format and pretty-print XML data for you, to allow you to quickly navigate through it.</para>
11		<para>Examples of pure XML-over-HTTP web services:</para>
	9	<para>简单地讲, HTTP web 服务直接使用 HTTP 操作从远程服务器按部就班地发送和接收数据。如果你要从服务器获取数据, 直接使用 HTTP GET; 如果您想发送新数据到服务器, 使用 HTTP POST。(一些较高级的 HTTP web 服务 API 也定义了一些使用 HTTP PUT 和 HTTP DELETE 修改现有数据和删除现有数据的方法。) 换句话说, 构建在 HTTP 协议中的 <quote>verbs</quote> (GET, POST, PUT 和 DELETE) 直接映射为接收, 发送, 修改和删除等应用级别的操作。</para>
	10	<para>利用这种方法的要点是简单的，并且这种简单的方法已经被许多不同的站点所接受。数据 -- 通常的 XML 数据 -- 能静态创建和存储, 或使用服务器端脚本动态生成, 并且所有主流的计算机语言都包含了下载这些数据的 HTTP 库。调试也很简单, 因为您能将数据上传到 web 服务器然后使用任何浏览器查看这些原始数据。现代的浏览器甚至可以为您进行良好地格式化并漂亮地打印这些 XML 数据, 允许您通过浏览器快速操纵数据。</para>
	11	<para>HTTP web 服务上的纯 XML 应用举例:</para>
12	12	<itemizedlist>
13		<listitem><para><ulink url="http://www.amazon.com/webservices">Amazon API</ulink> allows you to retrieve product information from the Amazon.com online store.</para></listitem>
14		<listitem><para><ulink url="http://www.nws.noaa.gov/alerts/">National Weather Service</ulink> (United States) and <ulink url="http://demo.xml.weather.gov.hk/">Hong Kong Observatory</ulink> (Hong Kong) offer weather alerts as a web service.</para></listitem>
15		<listitem><para><ulink url="http://atomenabled.org/">Atom API</ulink> for managing web-based content.</para></listitem>
16		<listitem><para><ulink url="http://syndic8.com/">Syndicated feeds</ulink> from weblogs and news sites bring you up-to-the-minute news from a variety of sites.</para></listitem>
	13	<listitem><para><ulink url="http://www.amazon.com/webservices">Amazon API</ulink> 允许您从 Amazon.com 在线商店获取产品信息。</para></listitem>
	14	<listitem><para><ulink url="http://www.nws.noaa.gov/alerts/">National Weather Service</ulink> (美国) 和 <ulink url="http://demo.xml.weather.gov.hk/">Hong Kong Observatory</ulink> (香港) 用 web 服务提供天气警报。</para></listitem>
	15	<listitem><para><ulink url="http://atomenabled.org/">Atom API</ulink> 用来管理基于 web 的内容。</para></listitem>
	16	<listitem><para><ulink url="http://syndic8.com/">Syndicated feeds</ulink> 从 weblogs 和新闻站点获取来自各式各样站点的最新的消息。</para></listitem>
17	17	</itemizedlist>
18		<para>In later chapters, you'll explore APIs which use HTTP as a transport for sending and receiving data, but don't map application semantics to the underlying HTTP semantics. (They tunnel everything over HTTP POST.) But this chapter will concentrate on using HTTP GET to get data from a remote server, and you'll explore several HTTP features you can use to get the maximum benefit out of pure HTTP web services.</para>
19		<para>Here is a more advanced version of the <filename class="headerfile">openanything</filename> module that you saw in <link linkend="streams">the previous chapter</link>:</para>
	18	<para>在后面的几章里, 我们将探索使用 HTTP 做数据发送和接收传输的 API, 但是不会将应用语义映射到 HTTP 的潜在语义。 (所有这些都是通过 HTTP POST 这个管道完成的。) 但是本章将关注使用 HTTP GET 从远程服务器获取数据, 并且将探索一些纯 HTTP web 服务之外的大量 HTTP 特性。</para>
	19	<para>如下所示为 <link linkend="streams">上一章</link> 看到的 <filename class="headerfile">openanything</filename> 模块的较高级版本 :</para>
20	20	<example>
21	21	<title>&openanything_filename;</title>
…	…
103	103	</example>
104	104	<itemizedlist role="furtherreading">
105		<title>Further reading</title>
106		<listitem><para>Paul Prescod believes that <ulink url="http://webservices.xml.com/pub/a/ws/2002/02/06/rest.html">pure HTTP web services are the future of the Internet</ulink>.</para></listitem>
	105	<title>进一步阅读</title>
	106	<listitem><para>Paul Prescod 认为 <ulink url="http://webservices.xml.com/pub/a/ws/2002/02/06/rest.html">纯 HTTP web 服务是 Internet 的未来</ulink>。</para></listitem>
107	107	</itemizedlist>
108	108	</section>
109	109	<section id="oa.review">
110	110	<?dbhtml filename="http_web_services/review.html"?>
111		<title>~~How not to fetch data over HTTP~~</title>
	111	<title>如何不获取 HTTP 上未修改的数据</title>
111	111	<abstract>
112	112	<title/>
113		<para>Let's say you want to download a resource over HTTP, such as a syndicated Atom feed. But you don't just want to download it once; you want to download it over and over again, every hour, to get the latest news from the site that's offering the news feed. Let's do it the quick-and-dirty way first, and then see how you can do better.</para>
	113	<para>假如说你想用 HTTP 下载资源, 例如一个 Atom feed 汇聚。但是你不仅仅想下载一次; 而是想一次又一次地下载它, 如每小时一次, 通过站点提供的 news feed 来获得最新的消息。让我们首先用一种快速且愚笨的方法来实现它, 然后看看如何改进它。
	114	</para>
114	115	</abstract>
115	116	<example>
116		<title>~~Downloading a feed the quick-and-dirty way~~</title>
	117	<title>用一种快速且愚笨的方法下载 feed</title>
116	117	<screen>
117	118	&prompt;<userinput>import urllib</userinput>
…	…
131	132	<calloutlist>
132	133	<callout arearefs="oa.review.1.1">
133		<para>Downloading anything over HTTP is incredibly easy in &python;; in fact, it's a one-liner. The &urllib; module has a handy &urlopen; function that takes the address of the page you want, and returns a file-like object that you can just <function>read()</function> from to get the full contents of the page. It just can't get much easier.</para>
	134	<para>&python; 使用 HTTP 下载任何东西简单到令人难以置信; 实际上, 只需要一行代码。 &urllib; 模块有一个便利的 &urlopen; 函数，它接受您所要获取的页面地址, 然后返回一个类似文件的对象，您仅仅使用 <function>read()</function> 便可获得页面的全部内容。这再简单不过了。
	135	</para>
134	136	</callout>
135	137	</calloutlist>
136	138	</example>
137		<para>So what's wrong with this? Well, for a quick one-off during testing or development, there's nothing wrong with it. I do it all the time. I wanted the contents of the feed, and I got the contents of the feed. The same technique works for any web page. But once you start thinking in terms of a web service that you want to access on a regular basis -- and remember, you said you were planning on retrieving this syndicated feed once an hour -- then you're being inefficient, and you're being rude.</para>
138		<para>Let's talk about some of the basic features of HTTP.</para>
	139	<para>那么这种方法有何不妥之处吗? 当然, 在测试或开发中一次性的使用没有什么不妥。我经常使用它。一旦我想要 feed 汇聚的内容, 我就获取 feed 的内容。同样的技术也适用于其他的 web 页面。但是一旦你开始按照 web 服务的基本访问规则进行思考 -- 并且记住, 你说你计划每小时一次重复地获取这样的 feed -- 那么效率实在是太低了, 并且对服务器来说过于粗暴无礼。
	140	</para>
	141	<para>下面先谈论一些 HTTP 的基本特性。</para>
139	142	</section>
140	143
141	144	<section id="oa.features">
142	145	<?dbhtml filename="http_web_services/http_features.html"?>
143		<title>~~Features of HTTP~~</title>
	146	<title>HTTP 的特性</title>
143	146	<abstract>
144	147	<title/>
145		<para>~~There are five important features of HTTP which you should support.~~</para>
	148	<para>这里有你必须关注的有关 HTTP 的五个重要特性。</para>
145	148	</abstract>
146	149	<section>
147		<title>&useragent;</title>
148		<para>The &useragent; is simply a way for a client to tell a server who it is when it requests a web page, a syndicated feed, or any sort of web service over HTTP. When the client requests a resource, it should always announce who it is, as specifically as possible. This allows the server-side administrator to get in touch with the client-side developer if anything is going fantastically wrong.</para>
149		<para>By default, &python; sends a generic &useragent;: <literal>Python-urllib/1.15</literal>. In the next section, you'll see how to change this to something more specific.</para>
	150	<title>用户代理 (&useragent;)</title>
	151	<para>&useragent; 是一种客户告知服务器要请求的 web 页, feed 汇聚或其他类型基于 HTTP 的 web 服务的简单的方式。当客户请求一个资源时, 客户总是应该明确地告知他是谁。以便当产生莫名其妙的错误时，允许服务器端的管理员与客户端的开发者取得联系。
	152	</para>
	153	<para>默认情况下 &python; 发送一个一般的 &useragent;: <literal>Python-urllib/1.15</literal>。下一节, 您将看到如何改变他到更多特性的东西。</para>
150	154	</section>
151	155	<section>
152		<title>Redirects</title>
153		<para>Sometimes resources move around. Web sites get reorganized, pages move to new addresses. Even web services can reorganize. A syndicated feed at <literal>http://example.com/index.xml</literal> might be moved to <literal>http://example.com/xml/atom.xml</literal>. Or an entire domain might move, as an organization expands and reorganizes; for instance, <literal>http://www.example.com/index.xml</literal> might be redirected to <literal>http://server-farm-1.example.com/index.xml</literal>.</para>
154		<para>Every time you request any kind of resource from an HTTP server, the server includes a status code in its response. Status code <literal>200</literal> means <quote>everything's normal, here's the page you asked for</quote>. Status code <literal>404</literal> means <quote>page not found</quote>. (You've probably seen 404 errors while browsing the web.)</para>
155		<para>HTTP has two different ways of signifying that a resource has moved. Status code <literal>302</literal> is a <emphasis>temporary redirect</emphasis>; it means <quote>oops, that got moved over here temporarily</quote> (and then gives the temporary address in a <literal>Location:</literal> header). Status code <literal>301</literal> is a <emphasis>permanent redirect</emphasis>; it means <quote>oops, that got moved permanently</quote> (and then gives the new address in a <literal>Location:</literal> header). If you get a <literal>302</literal> status code and a new address, the HTTP specification says you should use the new address to get what you asked for, but the next time you want to access the same resource, you should retry the old address. But if you get a <literal>301</literal> status code and a new address, you're supposed to use the new address from then on.</para>
156		<para><function>urllib.urlopen</function> will automatically <quote>follow</quote> redirects when it receives the appropriate status code from the HTTP server, but unfortunately, it doesn't tell you when it does so. You'll end up getting data you asked for, but you'll never know that the underlying library <quote>helpfully</quote> followed a redirect for you. So you'll continue pounding away at the old address, and each time you'll get redirected to the new address. That's two round trips instead of one: not very efficient! Later in this chapter, you'll see how to work around this so you can deal with permanent redirects properly and efficiently.</para>
	156	<title>重定向 (Redirects)</title>
	157	<para>有时资源移来移去。 Web 站点重组内容, 页面移动到了新的地址。甚至是 web 服务重组。原来位于 <literal>http://example.com/index.xml</literal> 的 feed 汇聚可能被移动到 <literal>http://example.com/xml/atom.xml</literal>。或者因为一个机构扩展或重组，整个域可能移动。例如, <literal>http://www.example.com/index.xml</literal> 可能被重定向到 <literal>http://server-farm-1.example.com/index.xml</literal>。</para>
	158	<para>您每次从 HTTP 服务器请求任何类型的资源时, 服务器的响应中均包含一个状态代码。状态代码 <literal>200</literal> 的意思是 <quote>一切正常, 这就是您请求的页面</quote>。状态代码 <literal>404</literal> 的意思是 <quote>页面没找到</quote>。 (当浏览 web 时，你可能看到过 404 errors。)</para>
	159	<para>HTTP 有两种不同的方法表示资源已经被移动。状态代码 <literal>302</literal> 表示 <emphasis>临时重定向</emphasis>; 这意谓着 <quote>哎呀, 访问内容被临时移动</quote> (然后在 <literal>Location:</literal> 头部给出临时地址)。状态代码 <literal>301</literal> 表示 <emphasis>永久重定向</emphasis>; 这意谓着 <quote>哎呀, 访问内容被永久移动</quote> (然后在 <literal>Location:</literal> 头部给出新地址)。如果您获得了一个 <literal>302</literal> 状态代码和一个新地址, HTTP 规范说您应该使用新地址获取您的请求, 但是下次您要访问同一资源时, 应该使用就地址重试。但是如果您获得了一个 <literal>301</literal> 状态代码和一个新地址, 您应该从那时起使用新地址。</para>
	160	<para>当从 HTTP 服务器接受到一个适当的状态代码时, <function>urllib.urlopen</function> 将自动 <quote>跟踪</quote> 重定向, 但不幸的是, 当它做了重定向时不会告诉你。
	161	你将最终获得所请求的数据，却丝毫不会察觉到在这个过程中一个潜在的库 <quote>帮助</quote> 你做了一次重定向操作。因此你将继续不断的使用旧地址, 并且每次都将获得被重定向的新地址。这一过程要往返两次而不是一次: 太没效率了! 本章的后面, 您将看到如何改进这一点，从而适当的且有效率的处理永久重定向。</para>
157	162	</section>
158	163	<section>
159	164	<title>&lastmodified;/&ifmodifiedsince;</title>
160		<para>Some data changes all the time. The home page of CNN.com is constantly updating every few minutes. On the other hand, the home page of Google.com only changes once every few weeks (when they put up a special holiday logo, or advertise a new service). Web services are no different; usually the server knows when the data you requested last changed, and HTTP provides a way for the server to include this last-modified date along with the data you requested.</para>
161		<para>If you ask for the same data a second time (or third, or fourth), you can tell the server the last-modified date that you got last time: you send an <literal>If-Modified-Since</literal> header with your request, with the date you got back from the server last time. If the data hasn't changed since then, the server sends back a special HTTP status code <literal>304</literal>, which means <quote>this data hasn't changed since the last time you asked for it</quote>. Why is this an improvement? Because when the server sends a <literal>304</literal>, <emphasis>it doesn't re-send the data</emphasis>. All you get is the status code. So you don't need to download the same data over and over again if it hasn't changed; the server assumes you have the data cached locally.</para>
162		<para>All modern web browsers support last-modified date checking. If you've ever visited a page, re-visited the same page a day later and found that it hadn't changed, and wondered why it loaded so quickly the second time -- this could be why. Your web browser cached the contents of the page locally the first time, and when you visited the second time, your browser automatically sent the last-modified date it got from the server the first time. The server simply says <literal>304: Not Modified</literal>, so your browser knows to load the page from its cache. Web services can be this smart too.</para>
163		<para>&python;'s URL library has no built-in support for last-modified date checking, but since you can add arbitrary headers to each request and read arbitrary headers in each response, you can add support for it yourself.</para>
	165	<para>有些数据随时都在变化。 CNN.com 的主页经常没几分钟就更新。另一方面, Google.com 的主页几个星期才更新一次 (当他们上传特殊的假日 logo, 或为一个新服务作广告时)。
	166	Web 服务没有什么区别; 通常你请求的数据最后一次更改后, 服务器知道会知道。并且 HTTP 为服务器提供了一种将最近修改数据连同你请求的数据一同发送的方法。</para>
	167	<para>如果你第二次 (或第三次, 或第四次) 请求相同的数据, 你可以告诉服务器你上一次获得的最后修改日期: 在你的请求中发送了一个 <literal>If-Modified-Since</literal> 头信息, 它包含了上一次从服务器连同数据所获得的日期。如果数据从那时起没有改变, 服务器将返回一个特殊的 HTTP 状态代码 <literal>304</literal>, 这意谓着 <quote>从上一次请求后这个数据没有改变</quote>。为什么这一点很重要? 因为当服务器发送状态编码 <literal>304</literal> 时, <emphasis>不再重新发送数据</emphasis>。您仅仅获得了这个状态代码。所以当数据没有更新时，你不需要一次又一次地下载相同的数据; 服务器假定你有本地的缓存数据。</para>
	168	<para>所有现代的浏览器都支持最近修改的数据检查。如果你曾经访问过某页, 一天后重新访问相同的页时发现它没有变化, 并奇怪第二次访问时页面加载得如此之快 -- 这就是原因所在。你的浏览器首次访问会在本地缓存页面内容, 当你第二次访问, 浏览器自动发送首次访问时从服务器获得的最近修改日期。服务器简单的返回 <literal>304: 没有修改</literal>, 因此浏览器就会知道从本地缓存加载页面。在这一点上，Web 服务也会显得很聪明。</para>
	169	<para>&python; 的 URL 库没有提供内置的最近修改数据检查支持, 但是你可以为每一个请求添加任意的头信息并在每一个响应中读取任意头信息, 从而自己添加这种支持。</para>
164	170	</section>
165	171	<section>
166	172	<title>&etag;/&ifnonematch;</title>
167		<para>ETags are an alternate way to accomplish the same thing as the last-modified date checking: don't re-download data that hasn't changed. The way it works is, the server sends some sort of hash of the data (in an <literal>ETag</literal> header) along with the data you requested. Exactly how this hash is determined is entirely up to the server. The second time you request the same data, you include the ETag hash in an <literal>If-None-Match:</literal> header, and if the data hasn't changed, the server will send you back a <literal>304</literal> status code. As with the last-modified date checking, the server <emphasis>just</emphasis> sends the <literal>304</literal>; it doesn't send you the same data a second time. By including the ETag hash in your second request, you're telling the server that there's no need to re-send the same data if it still matches this hash, since you still have the data from the last time.</para>
168		<para>&python;'s URL library has no built-in support for ETags, but you'll see how to add it later in this chapter.</para>
	173	<para>ETag 是实现与最近修改数据检查同样的功能的另一种方法: 没有变化时不重新下载数据。这种方法是这样工作的, 服务器发送你所请求的数据得同时，发送某种数据的 hash (在 <literal>ETag</literal> 头信息中) 。 hash 的确定完全取决于服务器。当第二次请求相同的数据时, 在 <literal>If-None-Match:</literal> 头信息中将包含 ETag hash, 如果数据没有改变, 服务器将返回 <literal>304</literal> 状态代码。与最近修改数据检查相同, 服务器 <emphasis>仅仅</emphasis> 发送 <literal>304</literal> 状态代码; 第二次将不为你发送相同的数据。在第二次请求时, 通过包含 ETag hash, 你会告诉服务器，如果 hash 仍旧匹配没有必要重新发送相同的数据, 因为你还有上一次访问过的数据。</para>
	174	<para>&python; 的 URL 库没有对 ETag 的内置支持, 但是在本章后面你将看到如何添加这种支持。</para>
169	175	</section>
170	176	<section>
171		<title>Compression</title>
172		<para>The last important HTTP feature is gzip compression. When you talk about HTTP web services, you're almost always talking about moving XML back and forth over the wire. XML is text, and quite verbose text at that, and text generally compresses well. When you request a resource over HTTP, you can ask the server that, if it has any new data to send you, to please send it in compressed format. You include the <literal>Accept-encoding: gzip</literal> header in your request, and if the server supports compression, it will send you back gzip-compressed data and mark it with a <literal>Content-encoding: gzip</literal> header.</para>
173		<para>&python;'s URL library has no built-in support for gzip compression per se, but you can add arbitrary headers to the request. And &python; comes with a separate &gzip; module, which has functions you can use to decompress the data yourself.</para>
174		<para>Note that <link linkend="oa.review">our little one-line script</link> to download a syndicated feed did not support any of these HTTP features. Let's see how you can improve it.</para>
	177	<title>压缩 (Compression)</title>
	178	<para>最后一个重要的 HTTP 特性是 gzip 压缩。
	179	当谈论 HTTP web 服务时, 几乎总是会谈及在网络线路上传输的 XML。XML 是文本, 而且还是相当冗长的文本, 并且一般文本可以被很好的压缩。当你通过 HTTP 请求一个资源, 可以告诉服务器, 如果它有任何新数据要发送给我时, 请以压缩的格式发送。在你的请求中包含 <literal>Accept-encoding: gzip</literal> 头信息, 如果服务器支持压缩, 他将返回由 gzip 压缩的数据并且使用 <literal>Content-encoding: gzip</literal> 头信息标记。</para>
	180	<para>&python; 的 URL 库本身没有内置对 gzip 压缩的支持, 但是你能为请求添加任意的有信息。 &python; 还提供了一个独立的 &gzip; 模块, 它提供了对数据进行解压缩的功能。</para>
	181	<para>注意 <link linkend="oa.review">短小的单行脚本</link> 下载 feed 汇聚并不支持任何这些 HTTP 特性。让我们来看看如何改善他。</para>
175	182	</section>
176	183	</section>
…	…
180	187	<section id="oa.debug">
181	188	<?dbhtml filename="http_web_services/debugging.html"?>
182		<title>~~Debugging HTTP web services~~</title>
	189	<title>调试 HTTP web 服务</title>
182	189	<abstract>
183	190	<title/>
184		<para>~~First, let's turn on the debugging features of &python;'s HTTP library and see what's being sent over the wire. This will be useful throughout the chapter, as you add more and more features.~~</para>
	191	<para>首先, 让我们开启 &python; HTTP 库的调试特性并查看网络线路上的传输过程。这对本章的全部内容都很有用, 因为你将添加越来越多的特性。</para>
184	191	</abstract>
185	192	<example>
186		<title>~~Debugging~~ HTTP</title>
	193	<title>调试 HTTP</title>
186	193	<screen>
187	194	&prompt;<userinput>import httplib</userinput>
…	…
211	218	<calloutlist>
212	219	<callout arearefs="oa.debug.1.1">
213		<para>&urllib; relies on another standard &python; library, &httplib;. Normally you don't need to <literal>import httplib</literal> directly (&urllib; does that automatically), but you will here so you can set the debugging flag on the <classname>HTTPConnection</classname> class that &urllib; uses internally to connect to the HTTP server. This is an incredibly useful technique. Some other &python; libraries have similar debug flags, but there's no particular standard for naming them or turning them on; you need to read the documentation of each library to see if such a feature is available.</para>
	220	<para>&urllib; 依赖于另一个 &python; 的标准库, &httplib;。通常你不必显示地给出 <literal>import httplib</literal> (&urllib; 会自动导入), 但是你可以为 &urllib; 使用内部的 <classname>HTTPConnection</classname> 类设置调试标记来访问 HTTP 服务器。这是一种令人难以置信的有用技术。 &python; 其他的一些库也有类似的调试标记, 但是
	221	没有命名和开启他们的特殊标准他们; 如果有类似的特性可用，你需要阅读每一个库的文档来察看使用方法。</para>
214	222	</callout>
215	223	<callout arearefs="oa.debug.1.2">
216		<para>Now that the debugging flag is set, information on the the HTTP request and response is printed out in real time. The first thing it tells you is that you're connecting to the server <literal>diveintomark.org</literal> on port 80, which is the standard port for HTTP.</para>
	224	<para>既然已经设置了调试标记, HTTP 的请求和响应信息会实时地被打印出来。首先告诉你的是你连接服务器<literal>diveintomark.org</literal> 的 80 端口, 这是 HTTP 的标准端口。</para>
216	224	</callout>
217	225	<callout arearefs="oa.debug.1.3">
218		<para>When you request the Atom feed, &urllib; sends three lines to the server. The first line specifies the HTTP verb you're using, and the path of the resource (minus the domain name). All the requests in this chapter will use <literal>GET</literal>, but in the next chapter on &soap;, you'll see that it uses <literal>POST</literal> for everything. The basic syntax is the same, regardless of the verb.</para>
	226	<para>当你请求 Atom feed 时, &urllib; 向服务器发送三行信息。第一行指出你使用的 HTTP verb, 和资源的路径 (除去域名)。在本章中所有的请求都将使用 <literal>GET</literal>, 但是在下一章的 &soap; 中, 你会看到所有的请求都使用 <literal>POST</literal> 。除了请求的动词不同之外, 基本的语法是相同的。</para>
218	226	</callout>
219	227	<callout arearefs="oa.debug.1.4">
220		<para>The second line is the <literal>Host</literal> header, which specifies the domain name of the service you're accessing. This is important, because a single HTTP server can host multiple separate domains. My server currently hosts 12 domains; other servers can host hundreds or even thousands.</para>
	228	<para>第二行是 <literal>Host</literal> 头信息, 它指出你所访问的服务的域名。这一点很重要, 因为一个独立的 HTTP 服务器可以服务于多个不同的域。当前我的服务器服务于 12 个域; 其他的服务器可以服务于成百乃至上千个域。</para>
220	228	</callout>
221	229	<callout arearefs="oa.debug.1.5">
222		<para>The third line is the &useragent; header. What you see here is the generic &useragent; that the &urllib; library adds by default. In the next section, you'll see how to customize this to be more specific.</para>
	230	<para>第三行是 &useragent; 头信息。在此你看到的是由 &urllib; 库默认添加的普通的 &useragent; 。在下一节, 你会看到如何自定义它的更多细节。</para>
222	230	</callout>
223	231	<callout arearefs="oa.debug.1.6">
224		<para>The server replies with a status code and a bunch of headers (and possibly some data, which got stored in the <varname>feeddata</varname> variable). The status code here is <literal>200</literal>, meaning <quote>everything's normal, here's the data you requested</quote>. The server also tells you the date it responded to your request, some information about the server itself, and the content type of the data it's giving you. Depending on your application, this might be useful, or not. It's certainly reassuring that you thought you were asking for an Atom feed, and lo and behold, you're getting an Atom feed (<literal>application/atom+xml</literal>, which is the registered content type for Atom feeds).</para>
	232	<para>服务器用状态代码和一系列的头信息答复 (并且一些数据可能会被存储到 <varname>feeddata</varname> 变量中)。这里的状态代码是 <literal>200</literal>, 意谓着 <quote>一切正常, 这就是您请求的数据</quote>。服务器也会告诉你响应请求的数据, 一些有关服务器自身的信息, 传给你的数据的内容类型。根据你的应用不同, 这或许有用, 或许没用。这充分确认了你所请求的是一个 Atom feed, 瞧, 你获得了 Atom feed (<literal>application/atom+xml</literal>, 它是已经注册的有关 Atom feeds 的内容类型)。</para>
224	232	</callout>
225	233	<callout arearefs="oa.debug.1.7">
226		<para>The server tells you when this Atom feed was last modified (in this case, about 13 minutes ago). You can send this date back to the server the next time you request the same feed, and the server can do last-modified checking.</para>
	234	<para>当此 Atom feed 有最近的修改, 服务器会告诉你 (本例中, 大约发生在 13 分钟之前)。当下次请求同样的 feed 时，你可以这个日期再发送给服务器, 服务器将做最近修改数据检查。</para>
226	234	</callout>
227	235	<callout arearefs="oa.debug.1.8">
228		<para>The server also tells you that this Atom feed has an ETag hash of <literal>"e8284-68e0-4de30f80"</literal>. The hash doesn't mean anything by itself; there's nothing you can do with it, except send it back to the server the next time you request this same feed. Then the server can use it to tell you if the data has changed or not.</para>
	236	<para>服务器也会告诉你这个 Atom feed 有一个值为 <literal>"e8284-68e0-4de30f80"</literal> 的 ETag hash。这个 hash 自身没有任何意义; 除了在下次访问相同的 feed 时将他送还给服务器之外, 你也不需要用它做什么。然后服务器使用它告诉你修改日期是否被改变了。</para>
228	236	</callout>
229	237	</calloutlist>
…	…
239	247	<section id="oa.useragent">
240	248	<?dbhtml filename="http_web_services/user_agent.html"?>
241		<title>~~Setting the~~ <literal>User-Agent</literal></title>
	249	<title>设置 <literal>User-Agent</literal></title>
241	249	<abstract>
242	250	<title/>
243		<para>~~The first step to improving your HTTP web services client is to identify yourself properly with a &useragent;. To do that, you need to move beyond the basic &urllib; and dive into &urllib2;.~~</para>
	251	<para>改善你的 HTTP web 服务客户的第一步就是用 &useragent; 适当地鉴别你自己。为了做到这一点, 你需要远离基本的 &urllib; 而深入到 &urllib2;。</para>
243	251	</abstract>
244	252	<example>
245		<title>~~Introducing &urllib2;~~</title>
	253	<title>&urllib2; 介绍</title>
245	253	<screen>
246	254	&prompt;<userinput>import httplib</userinput>
…	…
271	279	<calloutlist>
272	280	<callout arearefs="oa.useragent.1.1">
273		<para>If you still have your &python; &ide; open from the previous section's example, you can skip this, but this turns on <link linkend="oa.debug">HTTP debugging</link> so you can see what you're actually sending over the wire, and what gets sent back.</para>
	281	<para>如果你的 &python; &ide; 仍旧为上一节的例子而打开着, 你可以略过这一步, 在开启 <link linkend="oa.debug">HTTP 调试</link> 中你能看到网络线路上的实际传输过程。</para>
273	281	</callout>
274	282	<callout arearefs="oa.useragent.1.2">
275		<para>Fetching an HTTP resource with &urllib2; is a three-step process, for good reasons that will become clear shortly. The first step is to create a <classname>Request</classname> object, which takes the URL of the resource you'll eventually get around to retrieving. Note that this step doesn't actually retrieve anything yet.</para>
	283	<para>使用 &urllib2; 获取 HTTP 资源包括三个处理步骤, 这会有助于你理解这一过程。
	284	第一步是创建 <classname>Request</classname> 对象, 它接受一个你最终想要获取资源的 URL。注意这一步实际上还不能获取任何东西。</para>
276	285	</callout>
277	286	<callout arearefs="oa.useragent.1.3">
278		<para>The second step is to build a URL opener. This can take any number of handlers, which control how responses are handled. But you can also build an opener without any custom handlers, which is what you're doing here. You'll see how to define and use custom handlers later in this chapter when you explore redirects.</para>
	287	<para>第二步是创建一个 URL 开启器 (opener)。这可以使用任何数量的操作者来控制响应的处理。但你也可以创建一个没有任何自定义处理的开启器, 这就是这里的操作方式。你将在本章后面探究重定向的部分看到如何定义和使用自定义操作者的内容。</para>
278	287	</callout>
279	288	<callout arearefs="oa.useragent.1.4">
280		<para>The final step is to tell the opener to open the URL, using the <classname>Request</classname> object you created. As you can see from all the debugging information that gets printed, this step actually retrieves the resource and stores the returned data in <varname>feeddata</varname>.</para>
	289	<para>最后一个步骤是, 使用你创建的 <classname>Request</classname> 对象告诉开启器打开 URL。因为你能从获得的信息中看到所有调试信息, 这个步骤实际上获得了资源并且把返回数据存储在了 <varname>feeddata</varname> 中。</para>
280	289	</callout>
281	290	</calloutlist>
282	291	</example>
283	292	<example>
284		<title>~~Adding headers with the <classname>Request</classname>~~</title>
	293	<title>用 <classname>Request</classname> 添加头信息</title>
284	293	<screen>
285	294	&prompt;<userinput>request</userinput> <co id="oa.useragent.2.1"/>
…	…
312	321	<calloutlist>
313	322	<callout arearefs="oa.useragent.2.1">
314		<para>~~You're continuing from the previous example; you've already created a <classname>Request</classname> object with the URL you want to access.~~</para>
	323	<para>继续前面的例子; 你已经用你要访问的 URL 创建了 <classname>Request</classname> 。</para>
314	323	</callout>
315	324	<callout arearefs="oa.useragent.2.2">
316		<para>Using the <function>add_header</function> method on the <classname>Request</classname> object, you can add arbitrary HTTP headers to the request. The first argument is the header, the second is the value you're providing for that header. Convention dictates that a &useragent; should be in this specific format: an application name, followed by a slash, followed by a version number. The rest is free-form, and you'll see a lot of variations in the wild, but somewhere it should include a URL of your application. The &useragent; is usually logged by the server along with other details of your request, and including a URL of your application allows server administrators looking through their access logs to contact you if something is wrong.</para>
	325	<para>使用<classname>Request</classname> 对象的 <function>add_header</function> 方法, 你能向请求中添加任意的 HTTP 头信息。第一个参数是头信息, 第二个参数是头信息的值。 &useragent; 的协商指令应该使用如下的特殊格式: 应用名, 跟一个斜线, 跟版本号。剩下的是自由的格式, 你将看到许多疯狂的变化, 但通常这里应该包含你的应用的 URL。 The &useragent; 通常要记录经过服务器的连同你的请求的其他详细信息, 包含你的应用的 URL ，如果发生错误，允许服务器管理员通过查看他们的访问日志与你联系。</para>
316	325	</callout>
317	326	<callout arearefs="oa.useragent.2.3">
318		<para>~~The <varname>opener</varname> object you created before can be reused too, and it will retrieve the same feed again, but with your custom &useragent; header.~~</para>
	327	<para>之前你创建的<varname>opener</varname> 对象也可以再生, 且它将再次获得相同的 feed, 但是使用了你自定义的 &useragent; 头信息。</para>
318	327	</callout>
319	328	<callout arearefs="oa.useragent.2.4">
320		<para>And here's you sending your custom &useragent;, in place of the generic one that &python; sends by default. If you look closely, you'll notice that you defined a <literal>User-Agent</literal> header, but you actually sent a <literal>User-agent</literal> header. See the difference? &urllib2; changed the case so that only the first letter was capitalized. It doesn't really matter; HTTP specifies that header field names are completely case-insensitive.</para>
	329	<para>这就是你发送的自定义的 &useragent;, 代替了 &python; 默认发送的一般的 &useragent;。若你继续看，会注意到你定义的 <literal>User-Agent</literal> 头信息, 你实际上发送了一个 <literal>User-agent</literal> 头信息。看看有何不同? &urllib2; 改变了大小写所以只有首字母是大写的。这没问题，因为 HTTP 规定头子段名完全是大小写无关的。</para>
320	329	</callout>
321	330	</calloutlist>
…	…
329	338	<section id="oa.etags">
330	339	<?dbhtml filename="http_web_services/etags.html"?>
331		<title>~~Handling &lastmodified; and~~ &etag;</title>
	340	<title>处理 &lastmodified; 和 &etag;</title>
331	340	<abstract>
332	341	<title/>
333		<para>~~Now that you know how to add custom HTTP headers to your web service requests, let's look at adding support for &lastmodified; and &etag; headers.~~</para>
	342	<para>既然你知道如何在你的 web 服务请求中添加自定义的 HTTP 头信息, 接下来看看添加 &lastmodified; 和 &etag; 头信息的支持。</para>
333	342	</abstract>
334		<para>These examples show the output with debugging turned off. If you still have it turned on from the previous section, you can turn it off by setting <literal>httplib.HTTPConnection.debuglevel = 0</literal>. Or you can just leave debugging on, if that helps you.</para>
	343	<para>下面的这些例子将以调试标记置为关闭的状态来显示输出结果。如果你还停留在上一部分的开启状态, 可以使用 <literal>httplib.HTTPConnection.debuglevel = 0</literal> 将其设置为关闭状态。或者, 如果你认为有帮助也可以保持为开启状态。</para>
334	343	<example id="oa.etags.example.1">
335		<title>~~Testing~~ &lastmodified;</title>
	344	<title>测试 &lastmodified;</title>
335	344	<screen>
336	345	&prompt;<userinput>import urllib2</userinput>
…	…
374	383	<calloutlist>
375	384	<callout arearefs="oa.etags.1.1">
376		<para>Remember all those HTTP headers you saw printed out when you turned on debugging? This is how you can get access to them programmatically: <varname>firstdatastream.headers</varname> is <link linkend="fileinfo.userdict">an object that acts like a dictionary</link> and allows you to get any of the individual headers returned from the HTTP server.</para>
	385	<para>还记得当调试标记设置为开启时所有那些你看到的 HTTP 头信息打印输出吗?
	386	[todo]This is how you can get access to them programmatically:
	387
	388	<varname>firstdatastream.headers</varname> 是 <link linkend="fileinfo.userdict">一个类似 dictionary 行为的对象</link> 并且允许你获得任何个别的从 HTTP 服务器返回的头信息。</para>
377	389	</callout>
378	390	<callout arearefs="oa.etags.1.2">
379		<para>On the second request, you add the &ifmodifiedsince; header with the last-modified date from the first request. If the data hasn't changed, the server should return a <literal>304</literal> status code.</para>
	391	<para>在第二次请求时, 你用第一次请求获得的最近修改时间添加了 &ifmodifiedsince; 头信息。如果数据没被改变, 服务器应该返回一个 <literal>304</literal> 状态代码。</para>
379	391	</callout>
380	392	<callout arearefs="oa.etags.1.3">
381		<para>Sure enough, the data hasn't changed. You can see from the traceback that &urllib2; throws a special exception, <classname>HTTPError</classname>, in response to the <literal>304</literal> status code. This is a little unusual, and not entirely helpful. After all, it's not an error; you specifically asked the server not to send you any data if it hadn't changed, and the data didn't change, so the server told you it wasn't sending you any data. That's not an error; that's exactly what you were hoping for.</para>
	393	<para>毫无疑问, 数据没被改变。你可以从跟踪返回结果看到 &urllib2; 扔掉了特殊意外 ( special exception), <classname>HTTPError</classname>, 响应中的 <literal>304</literal> 状态代码。这有点不寻常, 并且完全没有任何帮助。毕竟, 它不是个错误; 你明确地询问服务器如果没有变化就不要发送任何数据, 并且数据没有变化, 所以服务器告诉你它没有为你发送任何数据。那不是个错误; 实际上也正是你所期望的。</para>
381	393	</callout>
382	394	</calloutlist>
383	395	</example>
384		<para>&urllib2; also raises an <classname>HTTPError</classname> exception for conditions that you would think of as errors, such as <literal>404</literal> (page not found). In fact, it will raise <classname>HTTPError</classname> for <emphasis>any</emphasis> status code other than <literal>200</literal> (OK), <literal>301</literal> (permanent redirect), or <literal>302</literal> (temporary redirect). It would be more helpful for your purposes to capture the status code and simply return it, without throwing an exception. To do that, you'll need to define a custom URL handler.</para>
	396	<para>&urllib2; also raises an <classname>HTTPError</classname> exception for conditions that you would think of as errors, 比如 <literal>404</literal> (page not found)。实际上, 它将为 <emphasis>任何</emphasis> 除了状态代码 <literal>200</literal> (OK), <literal>301</literal> (permanent redirect), 或 <literal>302</literal> (temporary redirect) 之外的状态引发 <classname>HTTPError</classname>。当你企图捕获状态代码并简单返回它, 不抛弃任何意外时, 这应该对你很有帮助。为了实现它, 你将需要自定义一个 URL 头信息。</para>
384	396	<example>
385		<title>Defining URL handlers</title>
386		<para>This custom URL handler is part of &openanything_filename;.</para>
	397	<title>定义 URL 头信息</title>
	398	<para>这个自定义的 URL 头信息是 &openanything_filename; 的一部分。</para>
387	399	<programlisting>
388	400	&oa_defaulthandler; <co id="oa.etags.2.1"/>
…	…
398	410	<calloutlist>
399	411	<callout arearefs="oa.etags.2.1">
400		<para>&urllib2; is designed around URL handlers. Each handler is just a class that can define any number of methods. When something happens -- like an HTTP error, or even a <literal>304</literal> code -- &urllib2; introspects into the list of defined handlers for a method that can handle it. You used a similar introspection in <xref linkend="kgp"/> to define handlers for different node types, but &urllib2; is more flexible, and introspects over as many handlers as are defined for the current request.</para>
	412	<para>&urllib2; 是围绕 URL 头信息而设计的。每一个头信息就是一个能定义任意数量方法的类。当某事件发生 -- 比如一个 HTTP 错误,
	413	以至一个 <literal>304</literal> 代码 -- &urllib2; 审视用于处理它的一系列已定义的处理器方法。你可以使用类似的审视方法 <xref linkend="kgp"/> 用来为不同类型的节点定义处理器，但是 &urllib2; 是很灵活的，并且可以根据定义的需要审视多个处理器。</para>
401	414	</callout>
402	415	<callout arearefs="oa.etags.2.2">
403		<para>&urllib2; searches through the defined handlers and calls the <methodname>http_error_default</methodname> method when it encounters a <literal>304</literal> status code from the server. By defining a custom error handler, you can prevent &urllib2; from raising an exception. Instead, you create the <classname>HTTPError</classname> object, but return it instead of raising it.</para>
	416	<para>当从服务器遇到一个 <literal>304</literal> 状态代码, &urllib2; 查找定义的操作并调用 <methodname>http_error_default</methodname> 方法。通过定义一个自定义的错误处理, 你可以阻止 &urllib2; 引发异常。取而代之的是, 你创建 <classname>HTTPError</classname> 对象, 返回它而不是引发异常。</para>
403	416	</callout>
404	417	<callout arearefs="oa.etags.2.3">
405		<para>~~This is the key part: before returning, you save the status code returned by the HTTP server. This will allow you easy access to it from the calling program.~~</para>
	418	<para>这是关键部分: 返回之前, 你保存从 HTTP 服务器返回的状态代码。这将使你从调用程序轻而易举地访问它。</para>
405	418	</callout>
406	419	</calloutlist>
407	420	</example>
408	421	<example>
409		<title>~~Using custom URL handlers~~</title>
	422	<title>使用自定义 URL 头信息</title>
409	422	<screen>
410	423	&prompt;<userinput>request.headers</userinput> <co id="oa.etags.3.1"/>
…	…
424	437	<calloutlist>
425	438	<callout arearefs="oa.etags.3.1">
426		<para>~~You're continuing the previous example, so the <classname>Request</classname> object is already set up, and you've already added the &ifmodifiedsince; header.~~</para>
	439	<para>继续前面的例子, <classname>Request</classname> 对象已经被设置, 并且你已经添加了 &ifmodifiedsince; 头信息。</para>
426	439	</callout>
427	440	<callout arearefs="oa.etags.3.2">
428		<para>This is the key: now that you've defined your custom URL handler, you need to tell &urllib2; to use it. Remember how I said that &urllib2; broke up the process of accessing an HTTP resource into three steps, and for good reason? This is why building the URL opener is its own step, because you can build it with your own custom URL handlers that override &urllib2;'s default behavior.</para>
	441	<para>这是关键所在: 既然已经定义了你的自定义 URL 头信息, 你需要告诉 &urllib2; 来使用它。还记得我怎么说的吗, &urllib2; 将一个 HTTP 资源的访问过程分解为三个步骤,
	442	and for good reason? This is why building the URL opener is its own step, 因为你能用你自定义的 URL 操作覆盖 &urllib2; 的默认行为来创建它。</para>
429	443	</callout>
430	444	<callout arearefs="oa.etags.3.3">
431		<para>Now you can quietly open the resource, and what you get back is an object that, along with the usual headers (use <varname>seconddatastream.headers.dict</varname> to acess them), also contains the HTTP status code. In this case, as you expected, the status is <literal>304</literal>, meaning this data hasn't changed since the last time you asked for it.</para>
	445	<para>现在你可以快速地打开一个资源, 返回给你的是, 连同常规头信息在内的对象 (使用 <varname>seconddatastream.headers.dict</varname> 访问他们), 也包括 HTTP 状态代码。在这种情况下, 向你所期望的, 状态代码是 <literal>304</literal>, 意谓着此数据自从上次请求后没有被修改。</para>
431	445	</callout>
432	446	<callout arearefs="oa.etags.3.4">
433		<para>Note that when the server sends back a <literal>304</literal> status code, it doesn't re-send the data. That's the whole point: to save bandwidth by not re-downloading data that hasn't changed. So if you actually want that data, you'll need to cache it locally the first time you get it.</para>
	447	<para>注意当服务器返回 <literal>304</literal> 状态代码时, 并没有重新发送数据。这就是全部的关键: 没有重新下载未修改的数据节省了带宽。因此若你确实想要那个数据, 你需要在首次获得它时在本地缓存数据。</para>
433	447	</callout>
434	448	</calloutlist>
435	449	</example>
436		<para>Handling &etag; works much the same way, but instead of checking for &lastmodified; and sending &ifmodifiedsince;, you check for &etag; and send &ifnonematch;. Let's start with a fresh &ide; session.</para>
	450	<para>处理 &etag; 的工作也非常相像, 不是检查 &lastmodified; 并发送 &ifmodifiedsince;, 而是检查 &etag; 并发送 &ifnonematch;。让我们打开一个新的 &ide; 会话。</para>
436	450	<example id="oa.etags.example">
437	451	<title>Supporting &etag;/&ifnonematch;</title>
…	…
468	482	<calloutlist>
469	483	<callout arearefs="oa.etags.4.1">
470		<para>Using the <varname>firstdatastream.headers</varname> pseudo-dictionary, you can get the &etag; returned from the server. (What happens if the server didn't send back an &etag;? Then this line would return &none;.)</para>
	484	<para>使用 <varname>firstdatastream.headers</varname> 伪字典, 你可以获得从服务器返回的 &etag; (如果服务器没有返回 &etag; 会发生什么? 那么这一行将返回 &none;.)</para>
470	484	</callout>
471	485	<callout arearefs="oa.etags.4.2">
472		<para>OK, ~~you got the data.~~</para>
	486	<para>OK, 你获得了数据。</para>
472	486	</callout>
473	487	<callout arearefs="oa.etags.4.3">
474		<para>~~Now set up the second call by setting the &ifnonematch; header to the &etag; you got from the first call.~~</para>
	488	<para>现在进行第二次调用，将 &ifnonematch; 头信息设置为你第一次调用获得的 &etag;。 </para>
474	488	</callout>
475	489	<callout arearefs="oa.etags.4.4">
476		<para>The second call succeeds quietly (without throwing an exception), and once again you see that the server has sent back a <literal>304</literal> status code. Based on the &etag; you sent the second time, it knows that the data hasn't changed.</para>
	490	<para>第二次调用静静地成功了 (没有出现任何的异常), 并且你有一次看到了从服务器返回的 <literal>304</literal> 状态代码。你第二次基于 &etag; 发送请求, 服务器知道数据没有被改变。</para>
476	490	</callout>
477	491	<callout arearefs="oa.etags.4.5">
478		<para>Regardless of whether the <literal>304</literal> is triggered by &lastmodified; date checking or &etag; hash matching, you'll never get the data along with the <literal>304</literal>. That's the whole point.</para>
	492	<para>无论 <literal>304</literal> 是否是被 &lastmodified; 数据检查或 &etag; hash 匹配触发的, 你从来都不会连同数据一起获得 <literal>304</literal>。这就是重点所在。</para>
478	492	</callout>
479	493	</calloutlist>
480	494	</example>
481	495	<note id="tip.etag.vs.lastmodified">
482		<title>Support &lastmodified; <emphasis>and</emphasis> &etag;</title>
483		<para>In these examples, the HTTP server has supported both &lastmodified; and &etag; headers, but not all servers do. As a web services client, you should be prepared to support both, but you must code defensively in case a server only supports one or the other, or neither.</para>
	496	<title>支持 &lastmodified; <emphasis>和</emphasis> &etag;</title>
	497	<para>在这些例子中, HTTP 服务器同时支持 &lastmodified; 和 &etag; 头信息, 但并非所有的服务器皆如此。作为一个 web 服务的客户, 你应该为支持两种头信息做准备, 但是你的程序也应该为服务器仅支持其中一种头信息或两种头信息都不支持而做准备。</para>
484	498	</note>
485	499	</section>
…	…
492	506	<section id="oa.redirect">
493	507	<?dbhtml filename="http_web_services/redirects.html"?>
494		<title>~~Handling redirects~~</title>
	508	<title>处理重定向</title>
494	508	<abstract>
495	509	<title/>
496		<para>~~You can support permanent and temporary redirects using a different kind of custom URL handler.~~</para>
	510	<para>你可以使用两种不同的自定义 URL 头信息来处理永久重定向和临时重定向。</para>
496	510	</abstract>
497		<para>~~First, let's see why a redirect handler is necessary in the first place.~~</para>
	511	<para>首先, 让我们来看看重定向处理的必要性。</para>
497	511	<example>
498		<title>~~Accessing web services without a redirect handler~~</title>
	512	<title>没有重定向处理的情况下，访问 web 服务 </title>
498	512	<screen>
499	513	&prompt;<userinput>import urllib2, httplib</userinput>
…	…
553	567	<calloutlist>
554	568	<callout arearefs="oa.redirect.1.0">
555		<para>~~You'll be better able to see what's happening if you turn on debugging.~~</para>
	569	<para>你最好看看开启调试状态时发生了什么。</para>
555	569	</callout>
556	570	<callout arearefs="oa.redirect.1.1">
557		<para>~~This is a URL which I have set up to permanently redirect to my Atom feed at <literal>http://diveintomark.org/xml/atom.xml</literal>.~~</para>
	571	<para>这是一个我已经设置了永久重定向到我的 Atom feed <literal>http://diveintomark.org/xml/atom.xml</literal> 的 URL。</para>
557	571	</callout>
558	572	<callout arearefs="oa.redirect.1.2">
559		<para>~~Sure enough, when you try to download the data at that address, the server sends back a <literal>301</literal> status code, telling you that the resource has moved permanently.~~</para>
	573	<para>毫无疑问, 当你试图从那个地址下载数据时, 服务器会返回 <literal>301</literal> 状态代码, 告诉你你访问的资源已经被永久移动了。</para>
559	573	</callout>
560	574	<callout arearefs="oa.redirect.1.3">
561		<para>~~The server also sends back a <literal>Location:</literal> header that gives the new address of this data.~~</para>
	575	<para>服务器同时返回 <literal>Location:</literal> 头信息，它给出了这个数据的新地址。</para>
561	575	</callout>
562	576	<callout arearefs="oa.redirect.1.4">
563		<para>&urllib2; ~~notices the redirect status code and automatically tries to retrieve the data at the new location specified in the <literal>Location:</literal> header.~~</para>
	577	<para>&urllib2; 注意到了重定向状态代码并会自动从<literal>Location:</literal> 头信息中给出的新地址获取数据。</para>
563	577	</callout>
564	578	<callout arearefs="oa.redirect.1.5">
565		<para>The object you get back from the <varname>opener</varname> contains the new permanent address and all the headers returned from the second request (retrieved from the new permanent address). But the status code is missing, so you have no way of knowing programmatically whether this redirect was temporary or permanent. And that matters very much: if it was a temporary redirect, then you should continue to ask for the data at the old location. But if it was a permanent redirect (as this was), you should ask for the data at the new location from now on.</para>
	579	<para>The object you get back from the <varname>opener</varname> contains the new permanent address and all the headers returned from the second request (retrieved from the new permanent address). But the status code is missing, so you have no way of knowing programmatically whether this redirect was temporary or permanent. And that matters very much: if it was a temporary redirect, then you should continue to ask for the data at the old location. But if it was a permanent redirect (as this was), you should ask for the data at the new location from now on.
	580
	581	从 <varname>opener</varname> 返回的对象包括新的永久地址和第二次请求获得的所有头信息 (从一个新的永久地址获得)。但是状态代码不见了, 因此你无从知晓重定向到底是永久重定向还是临时重定向。这是至关重要的: 如果这是临时重定向, 那么你应该继续使用旧地址访问数据。但是如果是永久重定向 (正如本例), 你应该从现在起使用新地址访问数据。</para>
566	582	</callout>
567	583	</calloutlist>
568	584	</example>
569		<para>This is suboptimal, but easy to fix. &urllib2; doesn't behave exactly as you want it to when it encounters a <literal>301</literal> or <literal>302</literal>, so let's override its behavior. How? With a custom URL handler, <link linkend="oa.etags">just like you did to handle <literal>304</literal> codes</link>.</para>
	585	<para>这不太理想, 但很容易改进。实际上当 &urllib2; 遇到 a <literal>301</literal> 或 <literal>302</literal> 时并不做行为, 所以让我们来覆盖这些行为。如何实现呢? 用一个自定义的头信息, <link linkend="oa.etags">正如你处理 <literal>304</literal> 代码所做的</link>。</para>
569	585	<example>
570		<title>Defining the redirect handler</title>
571		<para>This class is defined in &openanything_filename;.</para>
	586	<title>定义重定向处理器</title>
	587	<para>着各类定义在 &openanything_filename;。</para>
572	588	<programlisting>
573	589	&oa_smartredirect; <co id="oa.redirect.2.1"/>
…	…
592	608	<calloutlist>
593	609	<callout arearefs="oa.redirect.2.1">
594		<para>Redirect behavior is defined in &urllib2; in a class called <classname>HTTPRedirectHandler</classname>. You don't want to completely override the behavior, you just want to extend it a little, so you'll subclass <classname>HTTPRedirectHandler</classname> so you can call the ancestor class to do all the hard work.</para>
	610	<para>重定向行为定义在 &urllib2; 的一个叫做 <classname>HTTPRedirectHandler</classname> 的类中。我们不想完全地覆盖这些行为, 只想做点扩展, 所以我们将子类化 <classname>HTTPRedirectHandler</classname>, 从而我们仍然可以调用祖先类来实现所有原来的功能。</para>
594	610	</callout>
595	611	<callout arearefs="oa.redirect.2.2">
596		<para>When it encounters a <literal>301</literal> status code from the server, &urllib2; will search through its handlers and call the <methodname>http_error_301</methodname> method. The first thing ours does is just call the <methodname>http_error_301</methodname> method in the ancestor, which handles the grunt work of looking for the <literal>Location:</literal> header and following the redirect to the new address.</para>
	612	<para>当从服务器获得 <literal>301</literal> 状态代码, &urllib2; 将搜索头信息并调用 <methodname>http_error_301</methodname> 方法。我们首先要做的就是在祖先中调用 <methodname>http_error_301</methodname> 方法, 它将处理查找 <literal>Location:</literal> 头信息的工作并跟踪重定向到新地址。</para>
596	612	</callout>
597	613	<callout arearefs="oa.redirect.2.3">
598		<para>~~Here's the key: before you return, you store the status code (<literal>301</literal>), so that the calling program can access it later.~~</para>
	614	<para>这是关键: 返回之前, 你存储了状态代码 (<literal>301</literal>), 所以调用程序稍后就可以访问它了。</para>
598	614	</callout>
599	615	<callout arearefs="oa.redirect.2.4">
600		<para>~~Temporary redirects (status code <literal>302</literal>) work the same way: override the <literal>http_error_302</literal> method, call the ancestor, and save the status code before returning.~~</para>
	616	<para>临时重定向 (状态代码 <literal>302</literal>) 以相同的方式工作: 覆盖 <literal>http_error_302</literal> 方法, 调用祖先, 并在返回之前保存状态代码。</para>
600	616	</callout>
601	617	</calloutlist>
602	618	</example>
603		<para>~~So what has this bought us? You can now build a URL opener with the custom redirect handler, and it will still automatically follow redirects, but now it will also expose the redirect status code.~~</para>
	619	<para>这将为我们带来什么? 现在你可以构造一个用自定义重定向处理器的 URL 开启器, 并且它依然能自动跟踪重定向, 并且现在也能展示出重定向状态代码。</para>
603	619	<example>
604		<title>~~Using the redirect handler to detect permanent redirects~~</title>
	620	<title>使用重定向处理器检查永久重定向</title>
604	620	<screen>
605	621	&prompt;<userinput>request = urllib2.Request('http://diveintomark.org/redir/example301.xml')</userinput>
…	…
650	666	<calloutlist>
651	667	<callout arearefs="oa.redirect.3.1">
652		<para>~~First, build a URL opener with the redirect handler you just defined.~~</para>
	668	<para>首先, 用刚刚定义的重定向处理器创建一个 URL 开启器。</para>
652	668	</callout>
653	669	<callout arearefs="oa.redirect.3.2">
654		<para>You sent off a request, and you got a <literal>301</literal> status code in response. At this point, the <methodname>http_error_301</methodname> method gets called. You call the ancestor method, which follows the redirect and sends a request at the new location (<literal>http://diveintomark.org/xml/atom.xml</literal>).</para>
	670	<para>你发送了一个请求, 并在响应中获得了 <literal>301</literal> 状态代码。
	671	如此一来, <methodname>http_error_301</methodname> 方法就被调用了。你调用了祖先类, 跟踪了重定向并且发送了一个新地址 (<literal>http://diveintomark.org/xml/atom.xml</literal>) 请求。</para>
655	672	</callout>
656	673	<callout arearefs="oa.redirect.3.3">
657		<para>This is the payoff: now, not only do you have access to the new URL, but you have access to the redirect status code, so you can tell that this was a permanent redirect. The next time you request this data, you should request it from the new location (<literal>http://diveintomark.org/xml/atom.xml</literal>, as specified in <varname>f.url</varname>). If you had stored the location in a configuration file or a database, you need to update that so you don't keep pounding the server with requests at the old address. It's time to update your address book.</para>
	674	<para>这是决定性的一步: 现在, 你不仅做到了访问一个新 URL, 而且获得了重定向的状态代码, 所以你可以断定这是一个永久重定向。下一次你请求这个数据时, 就应该在 <varname>f.url</varname>) 指定使用新地址 (<literal>http://diveintomark.org/xml/atom.xml</literal>。如果你已经在配置文件或数据库中存储了这个地址, 就需要更新旧地址而不是反复地使用旧地址请求服务。现在是更新你的地址簿的时候了。</para>
657	674	</callout>
658	675	</calloutlist>
659	676	</example>
660		<para>~~The same redirect handler can also tell you that you <emphasis>shouldn't</emphasis> update your address book.~~</para>
	677	<para>同样的重定向处理也可以告诉你 <emphasis>不该</emphasis> 更新你的地址簿。</para>
660	677	<example>
661		<title>~~Using the redirect handler to detect temporary redirects~~</title>
	678	<title>使用重定向处理器检查临时重定向</title>
661	678	<screen>
662	679	&prompt;<userinput>request = urllib2.Request(</userinput>
…	…
702	719	<calloutlist>
703	720	<callout arearefs="oa.redirect.4.1">
704		<para>~~This is a sample URL I've set up that is configured to tell clients to <emphasis>temporarily</emphasis> redirect to <literal>http://diveintomark.org/xml/atom.xml</literal>.~~</para>
	721	<para>这是一个 URL 的例子，我已经设置了它，配置它告知客户为一个到 <literal>http://diveintomark.org/xml/atom.xml</literal> 的 <emphasis>临时</emphasis> 重定向。</para>
704	721	</callout>
705	722	<callout arearefs="oa.redirect.4.2">
706		<para>~~The server sends back a <literal>302</literal> status code, indicating a temporary redirect. The temporary new location of the data is given in the <literal>Location:</literal> header.~~</para>
	723	<para>服务器返回 <literal>302</literal> 状态代码, 标识出为一个临时重定向。数据的临时新地址在 <literal>Location:</literal> 头信息中给出。</para>
706	723	</callout>
707	724	<callout arearefs="oa.redirect.4.3">
708		<para>&urllib2; calls your <methodname>http_error_302</methodname> method, which calls the ancestor method of the same name in <classname>urllib2.HTTPRedirectHandler</classname>, which follows the redirect to the new location. Then your <methodname>http_error_302</methodname> method stores the status code (<literal>302</literal>) so the calling application can get it later.</para>
	725	<para>&urllib2; 调用你的 <methodname>http_error_302</methodname> 方法, 它调用了 <classname>urllib2.HTTPRedirectHandler</classname> 中的同名的祖先方法, 跟踪重定向到一个新地址。然后你的 <methodname>http_error_302</methodname> 方法存储状态代码 (<literal>302</literal>) 使调用程序在稍后可以获得它。</para>
708	725	</callout>
709	726	<callout arearefs="oa.redirect.4.4">
710		<para>And here you are, having successfully followed the redirect to <literal>http://diveintomark.org/xml/atom.xml</literal>. <varname>f.status</varname> tells you that this was a temporary redirect, which means that you should continue to request data from the original address (<literal>http://diveintomark.org/redir/example302.xml</literal>). Maybe it will redirect next time too, but maybe not. Maybe it will redirect to a different address. It's not for you to say. The server said this redirect was only temporary, so you should respect that. And now you're exposing enough information that the calling application can respect that.</para>
	727	<para>此时, 已经成功追踪重定向到 <literal>http://diveintomark.org/xml/atom.xml</literal>。 <varname>f.status</varname> 告诉你这是一个临时重定向, 这意谓着你应该继续使用原来的地址 (<literal>http://diveintomark.org/redir/example302.xml</literal>) 请求数据。也许下一次它仍然被重定向, 也许不会。也许会重定向到不同的地址。这也不好说。服务器说这个重定向仅仅是临时的, 你应该尊重它。并且现在你获得了调用程序能尊重它的充分信息。</para>
710	727	</callout>
711	728	</calloutlist>
…	…
719	736	<section id="oa.gzip">
720	737	<?dbhtml filename="http_web_services/gzip_compression.html"?>
721		<title>~~Handling compressed data~~</title>
	738	<title>处理被压缩的数据</title>
721	738	<abstract>
722	739	<title/>
723		<para>The last important HTTP feature you want to support is compression. Many web services have the ability to send data compressed, which can cut down the amount of data sent over the wire by 60% or more. This is especially true of XML web services, since XML data compresses very well.</para>
	740	<para>你要支持的最后一个重要的 HTTP 特性是压缩。许多 web 服务具有发送压缩数据的能力, 这可以将网络线路上传输的大量数据消减 60% 以上。尤其适用于 XML web 服务, 因为 XML 数据的压缩率可以很高。</para>
723	740	</abstract>
724		<para>~~Servers won't give you compressed data unless you tell them you can handle it.~~</para>
	741	<para>服务器不会为你发送押送数据，除非你告诉服务器你可以处理压缩数据。</para>
724	741	<example>
725		<title>~~Telling the server you would like compressed data~~</title>
	742	<title>告诉服务器你想获得压缩数据</title>
725	742	<screen>
726	743	&prompt;<userinput>import urllib2, httplib</userinput>
…	…
755	772	<calloutlist>
756	773	<callout arearefs="oa.gzip.1.1">
757		<para>This is the key: once you've created your <classname>Request</classname> object, add an <literal>Accept-encoding</literal> header to tell the server you can accept gzip-encoded data. <literal>gzip</literal> is the name of the compression algorithm you're using. In theory there could be other compression algorithms, but <literal>gzip</literal> is the compression algorithm used by 99% of web servers.</para>
	774	<para>这是关键: 一旦你已经创建了你的 <classname>Request</classname> 对象, 就添加一个 <literal>Accept-encoding</literal> 头信息告诉服务器你能接受 gzip 压缩数据。 <literal>gzip</literal> 是你使用的压缩算法的名称。理论上你可以使用其它的压缩算法, 但是 <literal>gzip</literal> 是 web 服务器上使用率高达 99% 的一种。</para>
757	774	</callout>
758	775	<callout arearefs="oa.gzip.1.2">
759		<para>~~There's your header going across the wire.~~</para>
	776	<para>这是你的头信息传越网络线路的过程。</para>
759	776	</callout>
760	777	<callout arearefs="oa.gzip.1.3">
761		<para>~~And here's what the server sends back: the <literal>Content-Encoding: gzip</literal> header means that the data you're about to receive has been gzip-compressed.~~</para>
	778	<para>这是服务器的返回信息: <literal>Content-Encoding: gzip</literal> 头信息意谓着你要回得的数据已经被 gzip 压缩了。</para>
761	778	</callout>
762	779	<callout arearefs="oa.gzip.1.4">
763		<para>The <literal>Content-Length</literal> header is the length of the compressed data, not the uncompressed data. As you'll see in a minute, the actual length of the uncompressed data was 15955, so gzip compression cut your bandwidth by over 60%!</para>
	780	<para><literal>Content-Length</literal> 头信息是已压缩数据的长度, 并非解压缩数据的长度。一会儿你会看到, 实际的解压缩数据长度为 15955, 因此 gzip 压缩节省了 60% 以上的网络带宽!</para>
763	780	</callout>
764	781	</calloutlist>
765	782	</example>
766	783	<example>
767		<title>~~Decompressing the data~~</title>
	784	<title>解压缩数据</title>
767	784	<screen>
768	785	&prompt;<userinput>compresseddata = f.read()</userinput> <co id="oa.gzip.2.1"/>
…	…
793	810	<calloutlist>
794	811	<callout arearefs="oa.gzip.2.1">
795		<para>Continuing from the previous example, <varname>f</varname> is the file-like object returned from the URL opener. Using its <methodname>read()</methodname> method would ordinarily get you the uncompressed data, but since this data has been gzip-compressed, this is just the first step towards getting the data you really want.</para>
	812	<para>继续上面的例子, <varname>f</varname> 是一个从 URL 开启器返回的类似文件的对象。使用它的 <methodname>read()</methodname> 方法将正常地获得非压缩数据, 但是因为这个数据已经被 gzip 压缩过, 所以这只是获得你想要的最终数据的第一步。</para>
795	812	</callout>
796	813	<callout arearefs="oa.gzip.2.2">
797		<para>OK, this step is a little bit of messy workaround. &python; has a &gzip; module, which reads (and actually writes) gzip-compressed files on disk. But you don't have a file on disk, you have a gzip-compressed buffer in memory, and you don't want to write out a temporary file just so you can uncompress it. So what you're going to do is create a file-like object out of the in-memory data (<varname>compresseddata</varname>), using the &stringio_modulename; module. You first saw the &stringio_modulename; module in <link linkend="kgp.openanything.stringio.example">the previous chapter</link>, but now you've found another use for it.</para>
	814	<para>OK, 只是先得有点儿凌乱的步骤。 &python; 有 &gzip; 模块, 它读取 (并实际写入) 磁盘上的 gzip 压缩文件。但是磁盘上还没有文件, 只在内存里有一个 gzip 压缩缓冲区, 并且你不想仅仅为了解压缩而写出一个临时文件。那么怎么做来创建一个内存数据 (<varname>compresseddata</varname>) 之外的类似文件的对象呢, 需要使用 &stringio_modulename; 模块。你首次看到 &stringio_modulename; 模块是在 <link linkend="kgp.openanything.stringio.example">上一章</link>, 但现在你会发现它的另一种用法。</para>
797	814	</callout>
798	815	<callout arearefs="oa.gzip.2.3">
799		<para>~~Now you can create an instance of <classname>GzipFile</classname>, and tell it that its <quote>file</quote> is the file-like object <varname>compressedstream</varname>.~~</para>
	816	<para>现在你可以创建 <classname>GzipFile</classname> 的一个实例, 并且告诉它其中的 <quote>file</quote> 是一个类似文件的对象 <varname>compressedstream</varname>。</para>
799	816	</callout>
800	817	<callout arearefs="oa.gzip.2.4">
801		<para>This is the line that does all the actual work: <quote>reading</quote> from <classname>GzipFile</classname> will decompress the data. Strange? Yes, but it makes sense in a twisted kind of way. <varname>gzipper</varname> is a file-like object which represents a gzip-compressed file. That <quote>file</quote> is not a real file on disk, though; <varname>gzipper</varname> is really just <quote>reading</quote> from the file-like object you created with &stringio_modulename; to wrap the compressed data, which is only in memory in the variable <varname>compresseddata</varname>. And where did that compressed data come from? You originally downloaded it from a remote HTTP server by <quote>reading</quote> from the file-like object you built with <function>urllib2.build_opener</function>. And amazingly, this all just works. Every step in the chain has no idea that the previous step is faking it.</para>
	818	<para>这是做所有工作的一行: 从 <classname>GzipFile</classname> 中 <quote>读取</quote> 将会解压缩数据。感到奇妙吗? 是的, 它确实解压缩了数据。 <varname>gzipper</varname> 是一个类似文件的对象, 它代表一个 gzip 压缩文件。尽管这个 <quote>file</quote> 并非一个磁盘上的真实文件; 但 <varname>gzipper</varname> 还是真正的从你用 &stringio_modulename; 包装了压缩数据所创建的类似文件的对象中 <quote>读取</quote> 数据, 它仅仅是内存中的变量 <varname>compresseddata</varname>。压缩的数据来自哪呢? 你通常从远程 HTTP 服务器下载, 然后从你用 <function>urllib2.build_opener</function> 创建的类似文件的对象中 <quote>读取</quote>。令人吃惊吧, 这就是所有的步骤。 [todo]Every step in the chain has no idea that the previous step is faking it.</para>
801	818	</callout>
802	819	<callout arearefs="oa.gzip.2.5">
803		<para>~~Look ma, real data. (15955 bytes of it, in fact.)~~</para>
	820	<para>看看吧, 实际的数据 (实际为 15955 bytes)。</para>
803	820	</callout>
804	821	</calloutlist>
805	822	</example>
806		<para><quote>But wait!</quote> I hear you cry. <quote>This could be even easier!</quote> I know what you're thinking. You're thinking that <varname>opener.open</varname> returns a file-like object, so why not cut out the &stringio_modulename; middleman and just pass <varname>f</varname> directly to <methodname>GzipFile</methodname>? OK, maybe you weren't thinking that, but don't worry about it, because it doesn't work.</para>
	823	<para><quote>等等!</quote> 我听见你在叫。 <quote>还能更简单吗!</quote> 我知道你在想什么。你在想那个 <varname>opener.open</varname> 返回一个类似文件的对象, 那么为什么不抛弃中间件 &stringio_modulename; 而通过 <varname>f</varname> 直接访问 <methodname>GzipFile</methodname> 呢? OK, 或许你没想到, 但是别为此担心, 因为那样无法工作。</para>
806	823	<example>
807		<title>~~Decompressing the data directly from the server~~</title>
	824	<title>从服务器直接解压缩数据</title>
807	824	<screen>
808	825	&prompt;<userinput>f = opener.open(request)</userinput> <co id="oa.gzip.3.1"/>
…	…
827	844	<calloutlist>
828	845	<callout arearefs="oa.gzip.3.1">
829		<para>~~Continuing from the previous example, you already have a <classname>Request</classname> object set up with an <literal>Accept-encoding: gzip</literal> header.~~</para>
	846	<para>继续前面的例子, 你已经有一个用 <literal>Accept-encoding: gzip</literal> 头信息设置的 <classname>Request</classname> 对象。 </para>
829	846	</callout>
830	847	<callout arearefs="oa.gzip.3.2">
831		<para>Simply opening the request will get you the headers (though not download any data yet). As you can see from the returned <literal>Content-Encoding</literal> header, this data has been sent gzip-compressed.</para>
	848	<para>简单地打开请求将获得你的头信息 (虽然还没下载任何数据)。正如你从 <literal>Content-Encoding</literal> 头信息所看到的, 这个数据已经被 gzip 压缩发送了。</para>
831	848	</callout>
832	849	<callout arearefs="oa.gzip.3.3">
833		<para>Since <methodname>opener.open</methodname> returns a file-like object, and you know from the headers that when you read it, you're going to get gzip-compressed data, why not simply pass that file-like object directly to <classname>GzipFile</classname>? As you <quote>read</quote> from the <classname>GzipFile</classname> instance, it will <quote>read</quote> compressed data from the remote HTTP server and decompress it on the fly. It's a good idea, but unfortunately it doesn't work. Because of the way gzip compression works, <classname>GzipFile</classname> needs to save its position and move forwards and backwards through the compressed file. This doesn't work when the <quote>file</quote> is a stream of bytes coming from a remote server; all you can do with it is retrieve bytes one at a time, not move back and forth through the data stream. So the inelegant hack of using &stringio_modulename; is the best solution: download the compressed data, create a file-like object out of it with &stringio_modulename;, and then decompress the data from that.</para>
	850	<para>从 <methodname>opener.open</methodname> 返回了一个类似文件的对象, 并且阅读头信息你可以获知, 你将获得 gzip 压缩数据, 为什么不简单地通过那个类似文件的对象直接访问 <classname>GzipFile</classname> 呢? 因为你从 <classname>GzipFile</classname> 实例 <quote>读取</quote> , 他将从远程 HTTP 服务器 <quote>读取</quote> 被压缩的数据并且立即解压缩。这是个好主意, 但是不行的是它无法工作。因为 gzip 压缩的工作方式所致, <classname>GzipFile</classname> [todo]needs to save its position and move forwards and backwards through the compressed file. 当 <quote>file</quote> 是来自远程服务器的字节流时无法工作; 你能用它做的所有工作就是一次返回一个字节流, [todo]not move back and forth through the data stream. 所以使用 &stringio_modulename; 这种看上去不雅的手段是最好的解决方案: 下载压缩的数据, 除此之外用 &stringio_modulename; 创建一个类似文件的对象, 并从中解压缩数据。</para>
833	850	</callout>
834	851	</calloutlist>
…	…
841	858	<section id="oa.alltogether">
842	859	<?dbhtml filename="http_web_services/alltogether.html"?>
843		<title>~~Putting it all together~~</title>
	860	<title>全部放在一起</title>
843	860	<abstract>
844	861	<title/>
845		<para>~~You've seen all the pieces for building an intelligent HTTP web services client. Now let's see how they all fit together.~~</para>
	862	<para>你已经看到了构造一个职能的 HTTP web 客户的所有片断。现在让我们看看如何将它们整合到一起。</para>
845	862	</abstract>
846	863	<example>
847		<title>The <function>openanything</function> function</title>
848		<para>This function is defined in &openanything_filename;.</para>
	864	<title><function>openanything</function> 函数</title>
	865	<para>这个函数定义在 &openanything_filename; 中。</para>
849	866	<programlisting>
850	867	&oa_def;
…	…
866	883	<calloutlist>
867	884	<callout arearefs="oa.alltogether.1.1">
868		<para>&urlparse; is a handy utility module for, you guessed it, parsing URLs. It's primary function, also called <function>urlparse</function>, takes a URL and splits it into a tuple of (scheme, domain, path, params, query string parameters, and fragment identifier). Of these, the only thing you care about is the scheme, to make sure that you're dealing with an HTTP URL (which &urllib2; can handle).</para>
	885	<para>&urlparse; 是一个解析 URL 的垂手可得的工具模块。它的主要功能也调用 <function>urlparse</function>, 获得一个 URL 并将其拆分为为一个包含 (scheme, domain, path, params, 查询串参数和验证片断) 的 tuple。当然, 你唯一需要注意的就是 scheme, 确认你处理的是一个 HTTP URL (&urllib2; 才能处理)。</para>
868	885	</callout>
869	886	<callout arearefs="oa.alltogether.1.2">
870		<para>You identify yourself to the HTTP server with the &useragent; passed in by the calling function. If no &useragent; was specified, you use a default one defined earlier in the &openanything_filename; module. You never use the default one defined by &urllib2;.</para>
	887	<para>通过调用函数使用 &useragent; 向 HTTP 服务器确定你的身份。如果没有 &useragent; 被指定, 你会使用一个默认的, 就是定义在早期的 &openanything_filename; 模块中的那个。你从来不会使用到默认的定义在 &urllib2; 中的那个。</para>
870	887	</callout>
871	888	<callout arearefs="oa.alltogether.1.3">
872		<para>~~If an &etag; hash was given, send it in the &ifnonematch; header.~~</para>
	889	<para>如果给出了 &etag;, 要在 &ifnonematch; 头信息中发送它。</para>
872	889	</callout>
873	890	<callout arearefs="oa.alltogether.1.4">
874		<para>~~If a last-modified date was given, send it in the &ifmodifiedsince; header.~~</para>
	891	<para>如果给出了最近修改日期, 要在 &ifmodifiedsince; 头信息中发送它。</para>
874	891	</callout>
875	892	<callout arearefs="oa.alltogether.1.5">
876		<para>~~Tell the server you would like compressed data if possible.~~</para>
	893	<para>如果可能要告诉服务器你要获取压缩数据。</para>
876	893	</callout>
877	894	<callout arearefs="oa.alltogether.1.6">
878		<para>Build a URL opener that uses <emphasis>both</emphasis> of the custom URL handlers: <classname>SmartRedirectHandler</classname> for handling <literal>301</literal> and <literal>302</literal> redirects, and <classname>DefaultErrorHandler</classname> for handling <literal>304</literal>, <literal>404</literal>, and other error conditions gracefully.</para>
	895	<para>使用 <emphasis>两个</emphasis> 自定义 URL 处理器创建一个 URL 开启器: <classname>SmartRedirectHandler</classname> 为了处理 <literal>301</literal> 和 <literal>302</literal> 重定向, 而 <classname>DefaultErrorHandler</classname> 为了处理 <literal>304</literal>, <literal>404</literal> 以及其它的错误条件。</para>
878	895	</callout>
879	896	<callout arearefs="oa.alltogether.1.7">
880		<para>~~That's it! Open the URL and return a file-like object to the caller.~~</para>
	897	<para>就这样! 打开 URL 并返回一个类似文件的对象给调用者。</para>
880	897	</callout>
881	898	</calloutlist>
882	899	</example>
883	900	<example>
884		<title>The <function>fetch</function> function</title>
885		<para>This function is defined in &openanything_filename;.</para>
	901	<title><function>fetch</function> 函数</title>
	902	<para>这个函数定义在 &openanything_filename; 中。</para>
886	903	<programlisting>
887	904	&oa_fetch_def;
…	…
915	932	<calloutlist>
916	933	<callout arearefs="oa.alltogether.2.1">
917		<para>~~First, you call the <function>openAnything</function> function with a URL, &etag; hash, &lastmodified; date, and &useragent;.~~</para>
	934	<para>首先, 你用 URL, &etag; hash, &lastmodified; 日期和 &useragent; 调用 <function>openAnything</function> 函数。</para>
917	934	</callout>
918	935	<callout arearefs="oa.alltogether.2.2">
919		<para>~~Read the actual data returned from the server. This may be compressed; if so, you'll decompress it later.~~</para>
	936	<para>读取从服务器返回的真实数据。这可能是被压缩的; 如果是, 将在后面进行解压缩。</para>
919	936	</callout>
920	937	<callout arearefs="oa.alltogether.2.3">
921		<para>Save the &etag; hash returned from the server, so the calling application can pass it back to you next time, and you can pass it on to <function>openAnything</function>, which can stick it in the &ifnonematch; header and send it to the remote server.</para>
	938	<para>保存从服务器返回的 &etag; hash, 所以调用程序下一次能通过它返回给你, 并且可以传递给 <function>openAnything</function>, 连同 &ifnonematch; 头信息一起发送给远程服务器。</para>
921	938	</callout>
922	939	<callout arearefs="oa.alltogether.2.4">
923		<para>~~Save the &lastmodified; date too.~~</para>
	940	<para>也要保存 &lastmodified; 数据。</para>
923	940	</callout>
924	941	<callout arearefs="oa.alltogether.2.5">
925		<para>~~If the server says that it sent compressed data, decompress it.~~</para>
	942	<para>如果服务器说它发送的是压缩数据, 就执行解压缩。</para>
925	942	</callout>
926	943	<callout arearefs="oa.alltogether.2.6">
927		<para>~~If you got a URL back from the server, save it, and assume that the status code is <literal>200</literal> until you find out otherwise.~~</para>
	944	<para>如果你的服务器返回一个 URL 就保存它, 并在查明之前假定状态代码为 <literal>200</literal>。</para>
927	944	</callout>
928	945	<callout arearefs="oa.alltogether.2.7">
929		<para>~~If one of the custom URL handlers captured a status code, then save that too.~~</para>
	946	<para>如果其中一个自定义 URL 处理器捕获了一个状态代码, 也要保存下来。</para>
929	946	</callout>
930	947	</calloutlist>
931	948	</example>
932	949	<example>
933		<title>~~Using~~ &openanything_filename;</title>
	950	<title>使用 &openanything_filename;</title>
933	950	<screen>
934	951	&prompt;<userinput>import openanything</userinput>
…	…
965	982	<calloutlist>
966	983	<callout arearefs="oa.alltogether.3.1">
967		<para>~~The very first time you fetch a resource, you don't have an &etag; hash or &lastmodified; date, so you'll leave those out. (They're <link linkend="apihelper.optional">optional parameters</link>.~~)</para>
	984	<para>真正第一次获取资源时, 你没有 &etag; hash 或 &lastmodified; 日期, 所以你不用使用这些参数。 (They're <link linkend="apihelper.optional">可选参数</link>。)</para>
967	984	</callout>
968	985	<callout arearefs="oa.alltogether.3.2">
969		<para>What you get back is a dictionary of several useful headers, the HTTP status code, and the actual data returned from the server. &openanything_module; handles the gzip compression internally; you don't care about that at this level.</para>
	986	<para>你获得了一个 dictionary, 它包括几个有用的头信息, HTTP 状态代码和从服务器返回的真实数据。 &openanything_module; 在内部处理 gzip 压缩; 在本级别上你不必关心它。</para>
969	986	</callout>
970	987	<callout arearefs="oa.alltogether.3.3">
971		<para>~~If you ever get a <literal>301</literal> status code, that's a permanent redirect, and you need to update your URL to the new address.~~</para>
	988	<para>如果你得到一个 <literal>301</literal> 状态代码, 表示是个永久重定向, 你需要更新你的 URL 到新地址。</para>
971	988	</callout>
972	989	<callout arearefs="oa.alltogether.3.4">
973		<para>The second time you fetch the same resource, you have all sorts of information to pass back: a (possibly updated) URL, the &etag; from the last time, the &lastmodified; date from the last time, and of course your &useragent;.</para>
	990	<para>第二次获取相同的资源, 你已经从以往获得了各种信息: URL (可能被更新了), 从上一次访问获得的 &etag;, 从上一次访问获得的 &lastmodified; 日期, 当然还有 &useragent;。</para>
973	990	</callout>
974	991	<callout arearefs="oa.alltogether.3.5">
975		<para>~~What you get back is again a dictionary, but the data hasn't changed, so all you got was a <literal>304</literal> status code and no data.~~</para>
	992	<para>你重新获取了这个 dictionary, 但是数据没有改变, 所以你得到了一个 <literal>304</literal> 状态代码而没有数据。</para>
975	992	</callout>
976	993	</calloutlist>
985	1002	<section id="oa.summary">
986	1003	<?dbhtml filename="http_web_services/summary.html"?>
987		<title>~~Summary~~</title>
	1004	<title>小结</title>
987	1004	<abstract>
988	1005	<title/>
989		<para>~~The &openanything_filename; and its functions should now make perfect sense.~~</para>
	1006	<para>&openanything_filename; 及其函数现在可以完美地工作了。</para>
989	1006	</abstract>
990		<para>~~There are 5 important features of HTTP web services that every client should support~~:</para>
	1007	<para>每个客户都应该支持的 5 个 HTTP web 服务重要特性:</para>
990	1007	<itemizedlist>
991		<listitem><para>Identifying your application <link linkend="oa.useragent">by setting a proper &useragent;</link>.</para></listitem>
992		<listitem><para>Handling <link linkend="oa.redirect">permanent redirects properly</link>.</para></listitem>
993		<listitem><para>Supporting <link linkend="oa.etags">&lastmodified; date checking</link> to avoid re-downloading data that hasn't changed.</para></listitem>
994		<listitem><para>Supporting <link linkend="oa.etags.example">&etag; hashes</link> to avoid re-downloading data that hasn't changed.</para></listitem>
995		<listitem><para>Supporting <link linkend="oa.gzip">gzip compression</link> to reduce bandwidth even when data <emphasis>has</emphasis> changed.</para></listitem>
	1008	<listitem><para> <link linkend="oa.useragent">通过设置适当的 &useragent;</link> 识别你的应用。</para></listitem>
	1009	<listitem><para>处理 <link linkend="oa.redirect">适当的永久重定向</link>。</para></listitem>
	1010	<listitem><para>支持 <link linkend="oa.etags">&lastmodified; 日期检查</link> 从而避免在数据未改变的情况下重新下载数据。</para></listitem>
	1011	<listitem><para>支持 <link linkend="oa.etags.example">&etag; hash</link> 从而避免在数据未改变的情况下重新下载数据。</para></listitem>
	1012	<listitem><para>支持 <link linkend="oa.gzip">gzip 压缩</link> 从而在数据 <emphasis>已经</emphasis> 改变的情况下尽可能地减少传输带宽。</para></listitem>
996	1013	</itemizedlist>
997	1014	</section>

Download in other formats:

Unified Diff