Changeset 248

Show
Ignore:
Timestamp:
Mon Dec 12 01:16:18 2005
Author:
osmond
Message:

译毕,待审

Files:

Legend:

Unmodified
Added
Removed
Modified
  • zh-translations/branches/diveintopython-zh-5.4/zh-cn/xml/dialect.xml

    r201 r248  
    2 2 <chapter id="dialect">  
    3 3 <?dbhtml filename="html_processing/index.html"?>  
    4   <title>&html; Processing</title>  
      4 <title>&html; 处理</title>  
    4 4 <titleabbrev id="dialect.numberonly">Chapter 8</titleabbrev>  
    5 5 <section id="dialect.divein">  
    6   <title>Diving in</title>  
      6 <title>概览</title>  
    6 6 <abstract>  
    7 7 <title/>  
    8   <para>I often see questions on &clp; like <quote>How can I list all the [headers|images|links] in my &html; document?</quote>  <quote>How do I parse/translate/munge the text of my &html; document but leave the tags alone?</quote>  <quote>How can I add/remove/quote attributes of all my &html; tags at once?</quote>  This chapter will answer all of these questions.</para>  
      8 <para>  
      9 我经常在 &clp; 上看到关于如下的问题:  <quote> 怎么才能从我的 &html; 文档中列出所有的 [头|图像|链接] 呢?</quote> <quote>怎么才能 [分析|解释|munge] 我的 &html; 文档的文本,但是不要标记呢?</quote>  <quote>怎么才能一次给我所有的 &html; 标记 [增加|删除|加引号] 属性呢?</quote> 本章将回答所有这些问题。</para>  
    9 10 </abstract>  
    10   <para>Here is a complete, working &python; program in two parts.  The first part, &basehtml_filename;, is a generic tool to help you process &html; files by walking through the tags and text blocks.  The second part, &dialect_filename;, is an example of how to use &basehtml_filename; to translate the text of an &html; document but leave the tags alone.  Read the &docstring;s and comments to get an overview of what's going on.  Most of it will seem like black magic, because it's not obvious how any of these class methods ever get called.  Don't worry, all will be revealed in due time.</para>  
      11 <para>下面给出一个完整的,可工作的 &python; 程序,它分为两部分。第一部分,&basehtml_filename; 是一个通用工具,它可以通过扫描标记和文本块来帮助您处理 &html; 文件。第二部分,&dialect_filename; 是一个例子,演示了如何使用 &basehtml_filename; 来转化 &html; 文档,保留文本但是去掉了标记。阅读文档字符串 (&docstring;) 和注释来了解将要发生事情的概况。大部分内容看上去象巫术,因为任一个这些类的方法是如何调用的不是很清楚。不要紧,所有内容都会按进度被逐步地展示出来。</para>  
    10 11 <example id="dialect.basehtml.listing">  
    11 12 <title>&basehtml_filename;</title>  
     
    23 24 </example>  
    24 25 <example>  
    25   <title>Output of &dialect_filename;</title>  
    26   <para>Running this script will translate <xref linkend="odbchelper.list"/> into <ulink url="../native_data_types/chef.html">mock Swedish Chef-speak</ulink> (from The Muppets), <ulink url="../native_data_types/fudd.html">mock Elmer Fudd-speak</ulink> (from Bugs Bunny cartoons), and <ulink url="../native_data_types/olde.html">mock Middle English</ulink> (loosely based on Chaucer's <citetitle>The Canterbury Tales</citetitle>).  If you look at the &html; source of the output pages, you'll see that all the &html; tags and attributes are untouched, but the text between the tags has been <quote>translated</quote> into the mock language.  If you look closer, you'll see that, in fact, only the titles and paragraphs were translated; the code listings and screen examples were left untouched.</para>  
      26 <title>&dialect_filename; 的输出结果</title>  
      27 <para>运行这个脚本会将 <xref linkend="odbchelper.list"/> 转换成<ulink url="../native_data_types/chef.html">模仿瑞典厨师用语 (mock Swedish Chef-speak)</ulink> (来自 The Muppets), <ulink url="../native_data_types/fudd.html">模仿埃尔默唠叨者用语 (mock Elmer Fudd-speak)</ulink> (来自 Bugs Bunny 卡通画) 和<ulink url="../native_data_types/olde.html">模仿中世纪英语 (mock Middle English)</ulink>  (零散地来源于乔叟的<citetitle>《坎特伯雷故事集》</citetitle>)。如果您查看输出页面的 &html; 源代码,您会发现所有的 &html; 标记和属性没有改动,但是在标记之间的文本被转换成模仿语言了。如果您观查得更仔细些,您会发现,实际上,仅有标题和段落被转换了;代码列表和屏幕例子没有改动。</para>  
    27 28 <programlisting>  
    28 29 &lt;div class="abstract">  
     
    38 39 <section id="dialect.sgmllib">  
    39 40 <?dbhtml filename="html_processing/introducing_sgmllib.html"?>  
    40   <title>Introducing &sgmllib_filename;</title>  
      41 <title>&sgmllib_filename; 介绍</title>  
    40 41 <abstract>  
    41 42 <title/>  
    42   <para>&html; processing is broken into three steps: breaking down the &html; into its constituent pieces, fiddling with the pieces, and reconstructing the pieces into &html; again.  The first step is done by &sgmllib_filename;, a part of the standard &python; library.</para>  
      43 <para>&html; 处理分成三步: 将 &html; 分解成它的组成片段,对片段进行加工,接着将片段再重新合成 HTML。第一步是通过 &sgmllib_filename; 来完成的,它是标准 &python; 库的一部分。</para>  
    42 43 </abstract>  
    43   <para>The key to understanding this chapter is to realize that &html; is not just text, it is structured text.  The structure is derived from the more-or-less-hierarchical sequence of start tags and end tags.  Usually you don't work with &html; this way; you work with it <emphasis>textually</emphasis> in a text editor, or <emphasis>visually</emphasis> in a web browser or web authoring tool.  &sgmllib_filename; presents &html; <emphasis>structurally</emphasis>.</para>  
    44   <para>&sgmllib_filename; contains one important class: &sgmlparser;.  &sgmlparser; parses &html; into useful pieces, like start tags and end tags.  As soon as it succeeds in breaking down some data into a useful piece, it calls a method on itself based on what it found.  In order to use the parser, you subclass the &sgmlparser; class and override these methods.  This is what I meant when I said that it presents &html; <emphasis>structurally</emphasis>: the structure of the &html; determines the sequence of method calls and the arguments passed to each method.</para>  
    45   <para>&sgmlparser; parses &html; into 8 kinds of data, and calls a separate method for each of them:</para>  
      44 <para>理解本章的关键是要知道 &html; 不只是文本,更是结构化文本。这种结构来源于开始与结束标记的或多或少分级序列。通常您并不以这种方式处理 &html; ,而是以 <emphasis>文本方式</emphasis> 在一个文本编辑中对其进行处理,或以 <emphasis>可视的方式</emphasis> 在一个浏览器中进行浏览或页面编辑工具中进行编辑。&sgmllib_filename; 表现出了 &html; 的<emphasis>结构</emphasis>。</para>  
      45 <para>&sgmllib_filename; 包含一个重要的类: &sgmlparser;。&sgmlparser;  将 &html; 分解成有用的片段,比如开始标记和结束标记。一旦它成功地分解出某个数据为一个有用的片段,它会根据所发现的数据,调用一个自身内部的方法。为了使用这个分析器,您需要子类化 &sgmlparser;  类,并且覆盖这些方法。这就是当我说它表示了 &html; <emphasis>结构</emphasis>的意思:  &html; 的结构决定了方法调用的次序和传给每个方法的参数。</para>  
      46 <para>&sgmlparser; 将 &html; 分析成 8 类数据,然后对每一类调用单独的方法:</para>  
    46 47 <variablelist>  
    47 48 <varlistentry>  
    48   <term>Start tag</term>  
    49   <listitem><para>An &html; tag that starts a block, like <sgmltag>&lt;html></sgmltag>, <sgmltag>&lt;head></sgmltag>, <sgmltag>&lt;body></sgmltag>, or <sgmltag>&lt;pre></sgmltag>, or a standalone tag like <sgmltag>&lt;br></sgmltag> or <sgmltag>&lt;img></sgmltag>.  When it finds a start tag <replaceable>tagname</replaceable>, &sgmlparser; will look for a method called <function>start_<replaceable>tagname</replaceable></function> or <function>do_<replaceable>tagname</replaceable></function>.  For instance, when it finds a <sgmltag>&lt;pre></sgmltag> tag, it will look for a <function>start_pre</function> or <function>do_pre</function> method.  If found, &sgmlparser; calls this method with a list of the tag's attributes; otherwise, it calls &unknown_starttag; with the tag name and list of attributes.</para></listitem>  
      49 <term>开始标记 (Start tag)</term>  
      50 <listitem><para>是一个开始一个块的 &html; 标记,象 <sgmltag>&lt;html></sgmltag>,<sgmltag>&lt;head></sgmltag>,<sgmltag>&lt;body></sgmltag> 或 <sgmltag>&lt;pre></sgmltag> 等,或是一个独一的标记,象 <sgmltag>&lt;br></sgmltag> 或 <sgmltag>&lt;img></sgmltag> 等。当它找到一个开始标记 <replaceable>tagname</replaceable>,&sgmlparser;  将查找名为 <function>start_<replaceable>tagname</replaceable></function> 或 <function>do_<replaceable>tagname</replaceable></function> 的方法。例如,当它找到一个 <sgmltag>&lt;pre></sgmltag> 标记,它将查找一个 <function>start_pre</function> 或 <function>do_pre</function> 的方法。如果找到了,&sgmlparser; 会使用这个标记的属性列表来调用这个方法;否则,它用这个标记的名字和属性列表来调用 &unknown_starttag; 方法。</para></listitem>  
    50 51 </varlistentry>  
    51 52 <varlistentry>  
    52   <term>End tag</term>  
    53   <listitem><para>An &html; tag that ends a block, like <sgmltag>&lt;/html></sgmltag>, <sgmltag>&lt;/head></sgmltag>, <sgmltag>&lt;/body></sgmltag>, or <sgmltag>&lt;/pre></sgmltag>.  When it finds an end tag, &sgmlparser; will look for a method called <function>end_<replaceable>tagname</replaceable></function>.  If found, &sgmlparser; calls this method, otherwise it calls &unknown_endtag; with the tag name.</para></listitem>  
      53 <term>结束标记 (End tag)</term>  
      54 <listitem><para>是结束一个块的 &html; 标记,象 <sgmltag>&lt;/html></sgmltag>,<sgmltag>&lt;/head></sgmltag>,<sgmltag>&lt;/body></sgmltag> 或 <sgmltag>&lt;/pre></sgmltag> 等。当找到一个结束标记时,&sgmlparser; 将查找名为 <function>end_<replaceable>tagname</replaceable></function> 的方法。如果找到,&sgmlparser;  调用这个方法,否则它使用标记的名字来调用 &unknown_endtag; 。</para></listitem>  
    54 55 </varlistentry>  
    55 56 <varlistentry>  
    56   <term>Character reference</term>  
    57   <listitem><para>An escaped character referenced by its decimal or hexadecimal equivalent, like <literal>&amp;#160;</literal>.  When found, &sgmlparser; calls &handle_charref; with the text of the decimal or hexadecimal character equivalent.</para></listitem>  
      57 <term>字符引用 (Character reference)</term>  
      58 <listitem><para>用字符的十进制或等同的十六进制来表示的转义字符,象 <literal>&amp;#160;</literal>。当找到,&sgmlparser; 使用十进制或等同的十六进制字符文本来调用 &handle_charref; 。</para></listitem>  
    58 59 </varlistentry>  
    59 60 <varlistentry>  
    60   <term>Entity reference</term>  
    61   <listitem><para>An &html; entity, like <literal>&amp;copy;</literal>.  When found, &sgmlparser; calls &handle_entityref; with the name of the &html; entity.</para></listitem>  
      61 <term>实体引用 (Entity reference)</term>  
      62 <listitem><para>&html; 实体,象 <literal>&amp;copy;</literal>。当找到,&sgmlparser; 使用 &html; 实体的名字来调用 &handle_entityref; 。</para></listitem>  
    62 63 </varlistentry>  
    63 64 <varlistentry>  
    64   <term>Comment</term>  
    65   <listitem><para>An &html; comment, enclosed in <literal>&lt;!-- ... --></literal>.  When found, &sgmlparser; calls &handle_comment; with the body of the comment.</para></listitem>  
      65 <term>注释 (Comment)</term>  
      66 <listitem><para>&html; 注释, 包括在 <literal>&lt;!-- ... --></literal>之间。当找到,&sgmlparser; 用注释内容来调用 &handle_comment;。</para></listitem>  
    66 67 </varlistentry>  
    67 68 <varlistentry>  
    68   <term>Processing instruction</term>  
    69   <listitem><para>An &html; processing instruction, enclosed in <literal>&lt;? ... ></literal>.  When found, &sgmlparser; calls &handle_pi; with the body of the processing instruction.</para></listitem>  
      69 <term>处理指令 (Processing instruction)</term>  
      70 <listitem><para>&html; 处理指令,包括在 <literal>&lt;? ... ></literal> 之间。当找到,&sgmlparser;  用处理指令内容来调用 &handle_pi;。</para></listitem>  
    70 71 </varlistentry>  
    71 72 <varlistentry>  
    72   <term>Declaration</term>  
    73   <listitem><para>An &html; declaration, such as a &doctype;, enclosed in <literal>&lt;! ... ></literal>.  When found, &sgmlparser; calls &handle_decl; with the body of the declaration.</para></listitem>  
      73 <term>声明 (Declaration)</term>  
      74 <listitem><para>&html; 声明,如 &doctype;,包括在 <literal>&lt;! ... ></literal>之间。当找到,&sgmlparser; 用声明内容来调用 &handle_decl;。</para></listitem>  
    74 75 </varlistentry>  
    75 76 <varlistentry>  
    76   <term>Text data</term>  
    77   <listitem><para>A block of text.  Anything that doesn't fit into the other 7 categories.  When found, &sgmlparser; calls &handle_data; with the text.</para></listitem>  
      77 <term>文本数据 (Text data)</term>  
      78 <listitem><para>文本块。不满足其它 7 种类别的任何东西。当找到,&sgmlparser; 用文本来调用 &handle_data;。</para></listitem>  
    78 79 </varlistentry>  
    79 80 </variablelist>  
    80 81 <important>  
    81   <title>Language evolution: &doctype;</title>  
    82   <para>&python; 2.0 had a bug where &sgmlparser; would not recognize declarations at all (&handle_decl; would never be called), which meant that &doctype;s were silently ignored.  This is fixed in &python; 2.1.</para>  
      82 <title>语言演变: &doctype;</title>  
      83 <para>&python; 2.0 存在一个 bug,即 &sgmlparser; 完全不能识别声明(&handle_decl; 永远不会调用),这就意味着 &doctype; 被静静地忽略掉了。在这错误在 &python; 2.1 中改正了。</para>  
    83 84 </important>  
    84   <para>&sgmllib_filename; comes with a test suite to illustrate this.  You can run &sgmllib_filename;, passing the name of an &html; file on the command line, and it will print out the tags and other elements as it parses them.  It does this by subclassing the &sgmlparser; class and defining &unknown_starttag;, &unknown_endtag;, &handle_data; and other methods which simply print their arguments.</para>  
      85 <para>&sgmllib_filename; 所附带的一个测试套件举例说明了这一点。您可以运行  &sgmllib_filename;,在命令行下传入一个 &html; 文件的名字,然后它会在分析标记和其它元素的同时将它们打印出来。它的实现是通过子类化 &sgmlparser; 类,然后定义 &unknown_starttag;,&unknown_endtag;,&handle_data; 和其它方法来实现的。这些方法简单地打印出它们的参数。</para>  
    84 85 <tip id="tip.commandline.windows">  
    85   <title>Specifying command line arguments in &windows;</title>  
    86   <para>In the &activepython; &ide; on &windows;, you can specify command line arguments in the <quote>Run script</quote> dialog.  Separate multiple arguments with spaces.</para>  
      86 <title>在 &windows; 下指定命令行参数</title>  
      87 <para>在 &windows; 下的 &activepython; &ide; 中,您可以在 <quote>Run script</quote> 对话框中指定命令行参数。用空格将多个参数分开。</para>  
    87 88 </tip>  
    88 89 <example>  
    89   <title>Sample test of &sgmllib_filename;</title>  
    90   <para>Here is a snippet from the table of contents of the &html; version of this book.  Of course your paths may vary.  (If you haven't downloaded the &html; version of the book, you can do so at <ulink url="&url_diveintopython;"/>.</para>  
      90 <title>&sgmllib_filename; 的样例测试</title>  
      91 <para>下面是一个片段,来自本书的 &html; 版本的目录,toc.html。 当然,您的存储路径可能与我的有所不同。  
      92 (如果您还没有下载本书的 &html; 版本, 可以从 <ulink url="&url_diveintopython;"/> 下载。</para>  
    91 93 <screen>  
    92 94 <prompt>c:\python23\lib></prompt> <userinput>type "c:\downloads\diveintopython\html\toc\index.html"</userinput>  
     
    104 106       &lt;link rel="stylesheet" href="diveintopython.css" type="text/css">  
    105 107  
    106   ... rest of file omitted for brevity ...  
      108 ...  ...  
    106 108 </literal></screen>  
    107   <para>Running this through the test suite of &sgmllib_filename; yields this output:</para>  
      109 <para>通过 &sgmllib_filename; 的测试套件来运行它,会得到如下的输出结果:</para>  
    107 109 <screen>  
    108 110 <prompt>c:\python23\lib></prompt> <userinput>python sgmllib.py "c:\downloads\diveintopython\html\toc\index.html"</userinput>  
     
    123 125 data: '\n      '  
    124 126  
    125   ... rest of output omitted for brevity ...  
      127 ...  ...  
    125 127 </computeroutput></screen>  
    126 128 </example>  
    127   <para>Here's the roadmap for the rest of the chapter:</para>  
      129 <para>下面是本章其它部分的路标:</para>  
    127 129 <itemizedlist>  
    128   <listitem><para>Subclass &sgmlparser; to create classes that extract interesting data out of &html; documents.</para></listitem>  
    129   <listitem><para>Subclass &sgmlparser; to create &basehtml_classname;, which overrides all 8 handler methods and uses them to reconstruct the original &html; from the pieces.</para></listitem>  
    130   <listitem><para>Subclass &basehtml_classname; to create &dialect_classname;, which adds some methods to process specific &html; tags specially, and overrides the &handle_data; method to provide a framework for processing the text blocks between the &html; tags.</para></listitem>  
    131   <listitem><para>Subclass &dialect_classname; to create classes that define text processing rules used by <function>&dialect_name;.handle_data</function>.</para></listitem>  
    132   <listitem><para>Write a test suite that grabs a real web page from &diveintopythonorg; and processes it.</para></listitem>  
      130 <listitem><para>子类化 &sgmlparser; 来创建从 &html; 文档中抽取感兴趣的数据的类。</para></listitem>  
      131 <listitem><para>子类化 &sgmlparser; 来创建 &basehtml_classname;,它覆盖了所有8个处理方法,然后使用它们从片段中重建原始的 &html;。</para></listitem>  
      132 <listitem><para>子类化 &basehtml_classname; 来创建 &dialect_classname;,它增加了一些方法,专门用来处理指定的 &html; 标记,然后覆盖了 &handle_data; 方法,提供了用来处理 &html; 标记之间文本块的框架。</para></listitem>  
      133 <listitem><para>子类化 &dialect_classname; 来创建定义了文本处理规则的类。这些规则被 <function>&dialect_name;.handle_data</function> 使用。</para></listitem>  
      134 <listitem><para>编写一个测试套件,它可以从 &diveintopythonorg; 处抓取一个真正的 web 页面,然后处理它。</para></listitem>  
    133 135 </itemizedlist>  
    134   <para>Along the way, you'll also learn about &locals;, &globals;, and dictionary-based string formatting.</para>  
      136 <para>继续阅读本章, 您还可以学习到有关 &locals;, &globals; 和基于 dictionary 的字符串格式化的内容。</para>  
    134 136 </section>  
    135 137 <section id="dialect.extract">  
    136 138 <?dbhtml filename="html_processing/extracting_data.html"?>  
    137   <title>Extracting data from &html; documents</title>  
      139 <title>从 &html; 文档中提取数据</title>  
    137 139 <abstract>  
    138 140 <title/>  
    139   <para>To extract data from &html; documents, subclass the &sgmlparser; class and define methods for each tag or entity you want to capture.</para>  
      141 <para>为了从 &html; 文档中提取数据,将 &sgmlparser; 类进行子类化,然后对想要捕捉的标记或实体定义方法。</para>  
    139 141 </abstract>  
    140   <para>The first step to extracting data from an &html; document is getting some &html;.  If you have some &html; lying around on your hard drive, you can use <link linkend="fileinfo.files">file functions</link> to read it, but the real fun begins when you get &html; from live web pages.</para>  
      142 <para>从 &html; 文档中提取数据的第一步是得到某个 &html; 文件。如果在您的硬盘里存放着 &html; 文件,您可以使用 <link linkend="fileinfo.files">file 函数</link> 将它读出来,但是真正有意思的是从实际的网页得到 &html;。</para>  
    140 142 <example id="dialect.extract.urllib">  
    141   <title>Introducing &urllib;</title>  
      143 <title>&urllib; 介绍</title>  
    141 143 <screen>  
    142 144 &prompt;<userinput>import urllib</userinput>                                       <co id="dialect.extract.1.1"/>  
     
    166 168 &lt;tr&gt;&lt;td class='tagline' colspan='2'&gt;Python&amp;nbsp;for&amp;nbsp;experienced&amp;nbsp;programmers&lt;/td&gt;&lt;/tr&gt;</computeroutput>  
    167 169  
    168   [...snip...]</screen>  
      170 [......]</screen>  
    168 170 <calloutlist>  
    169 171 <callout arearefs="dialect.extract.1.1">  
    170   <para>The &urllib; module is part of the standard &python; library.  It contains functions for getting information about and actually retrieving data from Internet-based &url;s (mainly web pages).</para>  
      172 <para>&urllib; 模块是标准 &python; 库的一部分。它包含了一些函数,可以从基于互联网的 &url; (主要指网页) 来获取信息并且真正取回数据。</para>  
    170 172 </callout>  
    171 173 <callout arearefs="dialect.extract.1.2">  
    172   <para>The simplest use of &urllib; is to retrieve the entire text of a web page using the &urlopen; function.  Opening a &url; is similar to <link linkend="fileinfo.files">opening a file</link>.  The return value of &urlopen; is a file-like object, which has some of the same methods as a file object.</para>  
      174 <para>&urllib; 模块最简单的使用是提取用 &urlopen; 函数取回的网页的整个文本。打开一个 &url; 同 <link linkend="fileinfo.files">打开一个文件</link>相似。&urlopen; 的返回值是象文件一样的对象,它具有一个文件对象一样的方法。</para>  
    172 174 </callout>  
    173 175 <callout arearefs="dialect.extract.1.3">  
    174   <para>The simplest thing to do with the file-like object returned by &urlopen; is &read;, which reads the entire &html; of the web page into a single string.  The object also supports &readlines;, which reads the text line by line into a list.</para>  
      176 <para>使用由 &urlopen; 所返回的类文件对象所能做的最简单的事情就是 &read;,它可以将网页的整个 &html; 读到一个字符串中。这个对象也支持 &readlines; 方法,这个方法可以将文本按行放入一个列表中。</para>  
    174 176 </callout>  
    175 177 <callout arearefs="dialect.extract.1.4">  
    176   <para>When you're done with the object, make sure to &close; it, just like a normal file object.</para>  
      178 <para>当用完这个对象,要确保将它 &close;,就如同一个普通的文件对象。</para>  
    176 178 </callout>  
    177 179 <callout arearefs="dialect.extract.1.5">  
    178   <para>You now have the complete &html; of the home page of &diveintopythonorg; in a string, and you're ready to parse it.</para>  
      180 <para>现在我们将 &diveintopythonorg; 主页的完整的 &html; 保存在一个字符串中了,接着我们将分析它。</para>  
    178 180 </callout>  
    179 181 </calloutlist>  
    180 182 </example>  
    181 183 <example id="dialect.extract.links">  
    182   <title>Introducing &urllister_filename;</title>  
      184 <title>&urllister_filename; 介绍</title>  
    182 184 &para_download;  
    183 185 <programlisting>  
     
    202 204 <calloutlist>  
    203 205 <callout arearefs="dialect.extract.2.1">  
    204   <para><function>reset</function> is called by the &init; method of &sgmlparser;, and it can also be called manually once an instance of the parser has been created.  So if you need to do any initialization, do it in &reset;, not in &init;, so that it will be re-initialized properly when someone re-uses a parser instance.</para>  
      206 <para><function>reset</function> 由 &sgmlparser; 的 &init; 方法来调用,也可以在创建一个分析器实例时手工来调用。所以如果您需要做初始化,在 &reset; 中去做,而不要在 &init; 中做。这样当某人重用一个分析器实例时,会正确地重新初始化。</para>  
    204 206 </callout>  
    205 207 <callout arearefs="dialect.extract.2.2">  
    206   <para><function>start_a</function> is called by &sgmlparser; whenever it finds an <sgmltag>&lt;a&gt;</sgmltag> tag.  The tag may contain an <literal>href</literal> attribute, and/or other attributes, like <literal>name</literal> or <literal>title</literal>.  The <varname>attrs</varname> parameter is a list of tuples, <literal>[(<replaceable>attribute</replaceable>, <replaceable>value</replaceable>), (<replaceable>attribute</replaceable>, <replaceable>value</replaceable>), ...]</literal>.  Or it may be just an <sgmltag>&lt;a&gt;</sgmltag>, a valid (if useless) &html; tag, in which case <varname>attrs</varname> would be an empty list.</para>  
      208 <para>只要找到一个 <sgmltag>&lt;a&gt;</sgmltag> 标记,<function>start_a</function> 就会由 &sgmlparser; 进行调用。这个标记可以包含一个 <literal>href</literal> 属性,或者包含其它的属性,如 <literal>name</literal> 或 <literal>title</literal>。<varname>attrs</varname> 参数是一个 tuple 的 list,<literal>[(<replaceable>attribute</replaceable>, <replaceable>value</replaceable>), (<replaceable>attribute</replaceable>, <replaceable>value</replaceable>), ...]</literal>。或者它可以只是一个有效的 &html; 标记 <sgmltag>&lt;a&gt;</sgmltag> (尽管无用),这时 <varname>attrs</varname> 将是个空 list。</para>  
    206 208 </callout>  
    207 209 <callout arearefs="dialect.extract.2.3">  
    208   <para>You can find out whether this <sgmltag>&lt;a&gt;</sgmltag> tag has an <literal>href</literal> attribute with a simple <link linkend="odbchelper.multiassign">multi-variable</link> <link linkend="odbchelper.map">list comprehension</link>.</para>  
      210 <para>我们可以通过一个简单的 <link linkend="odbchelper.multiassign">多变量</link> <link linkend="odbchelper.map">list 映射</link>来查找是否这个 <sgmltag>&lt;a&gt;</sgmltag> 标记拥有一个 <literal>href</literal> 属性。</para>  
    208 210 </callout>  
    209 211 <callout arearefs="dialect.extract.2.4">  
    210   <para>String comparisons like <literal>k=='href'</literal> are always case-sensitive, but that's safe in this case, because &sgmlparser; converts attribute names to lowercase while building <varname>attrs</varname>.</para>  
      212 <para>象 <literal>k=='href'</literal> 的字符串比较是区分大小写的,但是这里是安全的。因为 &sgmlparser; 会在创建 <varname>attrs</varname> 时将属性名转化为小写。</para>  
    210 212 </callout>  
    211 213 </calloutlist>  
    212 214 </example>  
    213 215 <example id="dialect.feed.example">  
    214   <title>Using &urllister_filename;</title>  
      216 <title>使用 &urllister_filename;</title>  
    214 216 <screen>  
    215 217 &prompt;<userinput>import urllib, urllister</userinput>  
     
    239 241 </computeroutput>  
    240 242  
    241   ... rest of output omitted for brevity ...</screen>  
      243 ......</screen>  
    241 243 <calloutlist>  
    242 244 <callout arearefs="dialect.extract.3.1">  
    243   <para>Call the <function>feed</function> method, defined in &sgmlparser;, to get &html; into the parser.<footnote><para>The technical term for a parser like &sgmlparser; is a <emphasis>consumer</emphasis>: it consumes &html; and breaks it down.  Presumably, the name <function>feed</function> was chosen to fit into the whole <quote>consumer</quote> motif.  Personally, it makes me think of an exhibit in the zoo where there's just a dark cage with no trees or plants or evidence of life of any kind, but if you stand perfectly still and look really closely you can make out two beady eyes staring back at you from the far left corner, but you convince yourself that that's just your mind playing tricks on you, and the only way you can tell that the whole thing isn't just an empty cage is a small innocuous sign on the railing that reads, <quote>Do not feed the parser.</quote>  But maybe that's just me.  In any event, it's an interesting mental image.</para></footnote>  It takes a string, which is what <function>usock.read()</function> returns.</para>  
      245 <para>调用定义在 &sgmlparser; 中的 <function>feed</function> 方法,将 &html; 内容放入分析器中。  
      246 <footnote><para>象 &sgmlparser; 这样的分析器,技术术语叫做 <emphasis>消费者 (consumer)</emphasis>。它消费 &html;,并且拆分它。也许因为这就选择了 <function>feed</function> 这个名字,以便同 <emphasis>消费者 </emphasis> 这个主题相适应。就个人来说,它让我想象在动物园看展览。里面有一个黑漆漆的兽穴,没有树,没有植物,没有任何生命的迹象。但只要您非常安静地站着,尽可能靠近着瞧,您会看到在远处的角落里有两只明眸在盯着您。但是您会安慰自已那不过是心理作用。唯一知道兽穴里并不是空无一物的方法,就是在栅栏上有一个不明显的标记,上面写着 <quote>禁止给分析器喂食</quote>。但也许只有我这么想,不管怎么样,这种心理想象很有意思。</para></footnote>  
      247 这个方法接收一个字符串,这个字符串就是 <function>usock.read()</function> 所返回的。</para>  
    244 248 </callout>  
    245 249 <callout arearefs="dialect.extract.3.2">  
    246   <para>Like files, you should &close; your &url; objects as soon as you're done with them.</para>  
      250 <para>象处理文件一样,一旦处理完毕,您应该 &close; 您的 &url; 对象。</para>  
    246 250 </callout>  
    247 251 <callout arearefs="dialect.extract.3.3">  
    248   <para>You should &close; your parser object, too, but for a different reason.  You've read all the data and fed it to the parser, but the <function>feed</function> method isn't guaranteed to have actually processed all the &html; you give it; it may buffer it, waiting for more.  Be sure to call &close; to flush the buffer and force everything to be fully parsed.</para>  
      252 <para>您也应该 &close; 您的分析器对象,但出于不同的原因。<function>feed</function> 方法不保证对传给它的全部 &html; 进行处理,它可能会对其进行缓冲处理,等待接收更多的内容。一旦没有更多的内容,应调用 &close; 来刷新缓冲区,并且强制所有内容被完全处理。</para>  
    248 252 </callout>  
    249 253 <callout arearefs="dialect.extract.3.4">  
    250   <para>Once the parser is &close;d, the parsing is complete, and <varname>parser.urls</varname> contains a list of all the linked &url;s in the &html; document.  (Your output may look different, if the download links have been updated by the time you read this.)</para>  
      254 <para>一旦分析器被 &close;,分析过程也就结束了。<varname>parser.urls</varname> 中包含了在 &html; 文档中所有的链接 &url;。 (当您读到此处发现输出结果不一样,那是因为下载了本书的更新版本。)</para>  
    250 254 </callout>  
    251 255 </calloutlist>  
     
    258 262 <section id="dialect.basehtml">  
    259 263 <?dbhtml filename="html_processing/basehtmlprocessor.html"?>  
    260   <title>Introducing &basehtml_filename;</title>  
      264 <title>&basehtml_filename; 介绍</title>  
    260 264 <abstract>  
    261 265 <title/>  
    262   <para>&sgmlparser; doesn't produce anything by itself.  It parses and parses and parses, and it calls a method for each interesting thing it finds, but the methods don't do anything.  &sgmlparser; is an &html; <emphasis>consumer</emphasis>: it takes &html; and breaks it down into small, structured pieces.  As you saw in the <link linkend="dialect.extract">previous section</link>, you can subclass &sgmlparser; to define classes that catch specific tags and produce useful things, like a list of all the links on a web page.  Now you'll take this one step further by defining a class that catches everything &sgmlparser; throws at it and reconstructs the complete &html; document.  In technical terms, this class will be an &html; <emphasis>producer</emphasis>.</para>  
      266 <para>&sgmlparser; 自身不会产生任何结果。它只是分析,分析,再分析,对于它找到的有趣的东西会调用相应的一个方法,但是这些方法什么都不做。&sgmlparser; 是一个 &html; <emphasis>消费者 (consumer)</emphasis>: 它接收 &html;,将其分解成小的、结构化的小块。正如您所看到的,在 <link linkend="dialect.extract">前一节</link> 中,您可以通过将 &sgmlparser; 子类化来定义一个类,它可以捕捉特别标记和生成有用东西,如一个网页中所有链接的一个列表。现在我们将沿着这条路更深一步。我们要定义一个可以捕捉 &sgmlparser; 所丢出来的所有东西的一个类,接着重建整个 &html; 文档。用技术术语来说,这个类将是一个 &html; <emphasis>生产者 (producer)</emphasis>。</para>  
    262 266 </abstract>  
    263   <para>&basehtml_classname; subclasses &sgmlparser; and provides all 8 essential handler methods: &unknown_starttag;, &unknown_endtag;, &handle_charref;, &handle_entityref;, &handle_comment;, &handle_pi;, &handle_decl;, and &handle_data;.</para>  
      267 <para>&sgmlparser; 子类化 &basehtml_classname; ,并且提供了全部的 8 个处理方法: &unknown_starttag;, &unknown_endtag;, &handle_charref;, &handle_entityref;, &handle_comment;, &handle_pi;, &handle_decl; 和&handle_data;。</para>  
    263 267 <example id="dialect.basehtml.intro">  
    264   <title>Introducing &basehtml_classname;</title>  
      268 <title>&basehtml_classname; 介绍</title>  
    264 268 <programlisting>  
    265 269 &basehtml_classdef;  
     
    300 304 <calloutlist>  
    301 305 <callout arearefs="dialect.basehtml.1.1">  
    302   <para>&reset;, called by <function>SGMLParser.__init__</function>, initializes &selfpieces; as an empty list before <link linkend="fileinfo.init.code.example">calling the ancestor method</link>.  &selfpieces; is a <link linkend="fileinfo.userdict.init.example">data attribute</link> which will hold the pieces of the &html; document you're constructing.  Each handler method will reconstruct the &html; that &sgmlparser; parsed, and each method will append that string to &selfpieces;.  Note that &selfpieces; is a list.  You might be tempted to define it as a string and just keep appending each piece to it.  That would work, but &python; is much more efficient at dealing with lists.<footnote><para>The reason &python; is better at lists than strings is that lists are mutable but strings are immutable.  This means that appending to a list just adds the element and updates the index.  Since strings can not be changed after they are created, code like <literal>s = s + newpiece</literal> will create an entirely new string out of the concatenation of the original and the new piece, then throw away the original string.  This involves a lot of expensive memory management, and the amount of effort involved increases as the string gets longer, so doing <literal>s = s + newpiece</literal> in a loop is deadly.  In technical terms, appending <varname>n</varname> items to a list is <literal>O(n)</literal>, while appending <varname>n</varname> items to a string is <literal>O(n<superscript>2</superscript>)</literal>.</para></footnote></para>  
      306 <para>&reset; 由 <function>SGMLParser.__init__</function> 来调用。在<link linkend="fileinfo.init.code.example">调用父类方法</link>之前将 &selfpieces; 初始化为空列表。&selfpieces; 是一个 <link linkend="fileinfo.userdict.init.example">数据属性</link>,将用来保存将要构造的 &html; 文档的片段。每个处理器方法都将重构 &sgmlparser; 所分析出来的 &html;,并且每个方法将生成的字符串追加到 &selfpieces; 之后。注意,&selfpieces; 是一个 list。也许您想将它定义为一个字符串,然后不停地将每个片段追加到它的后面。这样做是可以的,但是 &python; 在处理 list 方面效率更高一些。  
      307  
      308 <footnote><para>&python; 处理 list 比字符串快的原因是: list 是可变的,但字符串是不可变的。这就是说向 list 进行追加只是增加元素和修改索引。因为字符串在创建之后不能被修改,象 <literal>s = s + newpiece</literal> 这样的代码将会从原值和新片段的连接结果中创建一个全新的字符串,然后丢弃原来的字符串。这样就需要大量昂贵的内存管理,并且随着字符串变长,所需要的开销也在增长。所以在一个循环中执行 <literal>s = s + newpiece</literal> 非常不好。用技术术语来说,向一个 list 追加 <varname>n</varname> 个项的代价为 <literal>O(n)</literal>,而向一个字符串追加 <varname>n</varname> 个项的代价是 <literal>O(n<superscript>2</superscript>)</literal>。</para></footnote></para>  
    303 309 </callout>  
    304 310 <callout arearefs="dialect.basehtml.1.2">  
    305   <para>Since &basehtml_classname; does not define any methods for specific tags (like the <function>start_a</function> method in <link linkend="dialect.extract.links">&urllister_classname;</link>), &sgmlparser; will call &unknown_starttag; for every start tag.  This method takes the tag (<varname>tag</varname>) and the list of attribute name/value pairs (<varname>attrs</varname>), reconstructs the original &html;, and appends it to &selfpieces;.<!--<footnote><para>Technically, what &basehtml_classname; constructs is not guaranteed to be character-for-character identical to the original &html;, but it is <emphasis>equivalent</emphasis> to the original &html;.  &sgmlparser; converts the tag and the attribute names (but not the attribute values) to lowercase, so the string that &unknown_starttag; constructs may not be identical to the original tag, but that shouldn't make any difference because &html; specifies that tags and attribute names are case-insensitive.</para></footnote>-->  The <link linkend="odbchelper.stringformatting">string formatting</link> here is a little strange; you'll untangle that (and also the odd-looking &locals; function) later in this chapter.</para>  
      311 <para>因为 &basehtml_classname; 没有为特别标记定义方法 (如在 <link linkend="dialect.extract.links">&urllister_classname;</link> 中的<function>start_a</function> 方法),  
      312 &sgmlparser; 将对每一个开始标记调用 &unknown_starttag; 方法。这个方法接收标记 (<varname>tag</varname>) 和属性的名字/值对的 list(<varname>attrs</varname>) 两参数,重新构造初始的 &html;,接着将结果追加到 &selfpieces; 后。<!--<footnote><para>Technically, what &basehtml_classname; constructs is not guaranteed to be character-for-character identical to the original &html;, but it is <emphasis>equivalent</emphasis> to the original &html;.  &sgmlparser; converts the tag and the attribute names (but not the attribute values) to lowercase, so the string that &unknown_starttag; constructs may not be identical to the original tag, but that shouldn't make any difference because &html; specifies that tags and attribute names are case-insensitive.</para></footnote>--> 这里的 <link linkend="odbchelper.stringformatting">字符串格式化</link> 有些陌生,我们将留到下一节再说明。</para>  
    306 313 </callout>  
    307 314 <callout arearefs="dialect.basehtml.1.3">  
    308   <para>Reconstructing end tags is much simpler; just take the tag name and wrap it in the <literal>&lt;/...&gt;</literal> brackets.</para>  
      315 <para>重构结束标记要简单得多,只是使用标记名字,把它包在 <literal>&lt;/...&gt;</literal> 括号中。</para>  
    308 315 </callout>  
    309 316 <callout arearefs="dialect.basehtml.1.4">  
    310   <para>When &sgmlparser; finds a character reference, it calls &handle_charref; with the bare reference.  If the &html; document contains the reference <literal>&amp;&hash;160;</literal>, <varname>ref</varname> will be <literal>160</literal>.  Reconstructing the original complete character reference just involves wrapping <varname>ref</varname> in <literal>&amp;&hash;...;</literal> characters.</para>  
      317 <para>当 &sgmlparser; 找到一个字符引用时,会用原始的引用来调用 &handle_charref;。如果 &html; 文档包含 <literal>&amp;&hash;160;</literal> 这个引用,<varname>ref</varname> 将为 <literal>160</literal>。重构原始的完整的字符引用只要将 <varname>ref</varname> 包装在 <literal>&amp;&hash;...;</literal> 字符中间。</para>  
    310 317 </callout>  
    311 318 <callout arearefs="dialect.basehtml.1.5">  
    312   <para>Entity references are similar to character references, but without the hash mark.  Reconstructing the original entity reference requires wrapping <varname>ref</varname> in <literal>&amp;...;</literal> characters.  (Actually, as an erudite reader pointed out to me, it's slightly more complicated than this.  Only certain standard &html; entites end in a semicolon; other similar-looking entities do not.  Luckily for us, the set of standard &html; entities is defined in a dictionary in a &python; module called &htmlentitydefs;.  Hence the extra &if; statement.)</para>  
      319 <para>实体引用同字符引用相似,但是没有#号。重建原始的实体引用只要将 <varname>ref</varname> 包装在 <literal>&amp;...;</literal> 字符串中间。 (实际上,一位博学的读者曾经向我指出,除些之外还稍微有些复杂。仅有某种标准的 &html; 实体以一个分号结束;其它看上去差不多的实体并不如此。幸运的是,标准 &html; 实体集已经定义在 &python; 的一个叫做 &htmlentitydefs; 的模块中了。从而引出额外的 &if; 语句。) </para>  
    312 319 </callout>  
    313 320 <callout arearefs="dialect.basehtml.1.6">  
    314   <para>Blocks of text are simply appended to &selfpieces; unaltered.</para>  
      321 <para>文本块则简单地不经修改地追加到 &selfpieces; 后。</para>  
    314 321 </callout>  
    315 322 <callout arearefs="dialect.basehtml.1.7">  
    316   <para>&html; comments are wrapped in <literal>&lt;!--...--&gt;</literal> characters.</para>  
      323 <para>&html; 注释包装在 <literal>&lt;!--...--&gt;</literal> 字符中。</para>  
    316 323 </callout>  
    317 324 <callout arearefs="dialect.basehtml.1.8">  
    318   <para>Processing instructions are wrapped in <literal>&lt;?...&gt;</literal> characters.</para>  
      325 <para>处理指令包装在 <literal>&lt;?...&gt;</literal> 字符中。</para>  
    318 325 </callout>  
    319 326 </calloutlist>  
    320 327 </example>  
    321 328 <important>  
    322   <title>Processing &html; with embedded script</title>  
    323   <para>The &html; specification requires that all non-&html; (like client-side &javascript;) must be enclosed in &html; comments, but not all web pages do this properly (and all modern web browsers are forgiving if they don't).  &basehtml_classname; is not forgiving; if script is improperly embedded, it will be parsed as if it were &html;.  For instance, if the script contains less-than and equals signs, &sgmlparser; may incorrectly think that it has found tags and attributes.  &sgmlparser; always converts tags and attribute names to lowercase, which may break the script, and &basehtml_classname; always encloses attribute values in double quotes (even if the original &html; document used single quotes or no quotes), which will certainly break the script.  Always protect your client-side script within &html; comments.</para>  
      329 <title>包含植入脚本的 &html; 处理</title>  
      330 <para>&html; 规范要求所有非 &html; (象客户端的 &javascript;) 必须包括在 &html; 注释中,但不是所有的页面都是这么做的 (而且所有的最新的浏览器也都容许不这样做) 。&basehtml_classname; 不允许这样,如果脚本嵌入的不正确,它将被当作 &html; 一样进行分析。例如,如果脚本包含了小于和等于号,&sgmlparser; 可能会错误地认为找到了标记和属性。&sgmlparser; 总是把标记名和属性名转换成小写,这样可能破坏了脚本,并且 &basehtml_classname; 总是用双引号来将属性封闭起来 (尽管原始的 &html; 文档可能使用单引号或没有引号) ,这样必然会破坏脚本。应该总是将您的客户端脚本放在 &html; 注释中进行保护。</para>  
    324 331 </important>  
    325 332 <example id="dialect.output.example">  
    326   <title>&basehtml_classname; output</title>  
      333 <title>&basehtml_classname; 输出结果</title>  
    326 333 <programlisting>  
    327 334 &basehtml_outputdef; <co id="dialect.basehtml.2.1"/>  
     
    337 344 <calloutlist>  
    338 345 <callout arearefs="dialect.basehtml.2.1">  
    339   <para>This is the one method in &basehtml_classname; that is never called by the ancestor &sgmlparser;.  Since the other handler methods store their reconstructed &html; in &selfpieces;, this function is needed to join all those pieces into one string.  As noted before, &python; is great at lists and mediocre at strings, so you only create the complete string when somebody explicitly asks for it.</para>  
      346 <para>这是在 &basehtml_classname; 中的一个方法,它永远不会被父类 &sgmlparser; 所调用。因为其它的处理器方法将它们重构的 &html; 保存在 &selfpieces; 中,这个函数需要将所有这些片段连接成一个字符串。正如前面提到的,&python; 在处理列表方面非常出色,但对于字符串处理就逊色了。所以我们只有在某人确实需要它时才创建完整的字符串。</para>  
    339 346 </callout>  
    340 347 <callout arearefs="dialect.basehtml.2.2">  
    341   <para>If you prefer, you could use the &join; method of the &string; module instead: <literal>string.join(self.pieces, "")</literal></para>  
      348 <para>如果您愿意,也可以换成使用 &string; 模块的 &join; 方法: <literal>string.join(self.pieces, "")</literal>。</para>  
    341 348 </callout>  
    342 349 </calloutlist>  
    343 350 </example>  
    344 351 <itemizedlist role="furtherreading">  
    345   <title>Further reading</title>  
    346   <listitem><para>&w3c; discusses <ulink url="&url_w3c;TR/REC-html40/charset.html#entities">character and entity references</ulink>.</para></listitem>  
    347   <listitem><para>&pythonlibraryreference; confirms your suspicions that <ulink url="&url_pythonlibraryreference;module-htmlentitydefs.html">the &htmlentitydefs; module</ulink> is exactly what it sounds like.</para></listitem>  
      352 <title>进一步阅读</title>  
      353 <listitem><para>&w3c; 讨论了 <ulink url="&url_w3c;TR/REC-html40/charset.html#entities">字符和实体引用</ulink>。</para></listitem>  
      354 <listitem><para>&pythonlibraryreference; 解答了您的怀疑,即 <ulink url="&url_pythonlibraryreference;module-htmlentitydefs.html">&htmlentitydefs; 模块</ulink> 的确名符其实。</para></listitem>  
    348 355 </itemizedlist>  
    349 356 </section>  
    350 357 <section id="dialect.locals">  
    351 358 <?dbhtml filename="html_processing/locals_and_globals.html"?>  
    352   <title>&locals; and &globals;</title>  
      359 <title>&locals;  &globals;</title>  
    352 359 <abstract>  
    353 360 <title/>  
    354   <para>Let's digress from <acronym>HTML</acronym> processing for a minute and talk about how &python; handles variables.  &python; has two built-in functions, &locals; and &globals;, which provide dictionary-based access to local and global variables.</para>  
      361 <para>我们先偏离一下 <acronym>HTML</acronym> 处理的主题, 讨论一下 &python; 如何处理变量。  &python; 有两个内置的函数, &locals; 和 &globals;, 它们提供了基于 dictionary 的访问局部和全局变量的方式。</para>  
    354 361 </abstract>  
    355   <para>Remember &locals;?  You first saw it here:</para>  
      362 <para>还记得 &locals; 吗?  您第一次是在这里看到的:</para>  
    355 362 <informalexample>  
    356 363 <programlisting>  
     
    365 372 </programlisting>  
    366 373 </informalexample>  
    367   <para>No, wait, you can't learn about &locals; yet.  First, you need to learn about namespaces.  This is dry stuff, but it's important, so pay attention.</para>  
    368   <para>&python; uses what are called namespaces to keep track of variables.  A namespace is just like a dictionary where the keys are names of variables and the dictionary values are the values of those variables.  In fact, you can access a namespace as a &python; dictionary, as you'll see in a minute.</para>  
    369   <para>At any particular point in a &python; program, there are several namespaces available.  Each function has its own namespace, called the local namespace, which keeps track of the function's variables, including function arguments and locally defined variables.  Each module has its own namespace, called the global namespace, which keeps track of the module's variables, including functions, classes, any other imported modules, and module-level variables and constants.  And there is the built-in namespace, accessible from any module, which holds built-in functions and exceptions.</para>  
    370   <para>When a line of code asks for the value of a variable <varname>x</varname>, &python; will search for that variable in all the available namespaces, in order:</para>  
      374 <para>不, 等等, 此时您还不能理解 &locals; 。首先, 您需要学习关于命名空间的知识。这很枯燥, 但是很重要, 因此要要耐心些。</para>  
      375 <para>&python; 使用叫做名字空间的东西来记录变量的轨迹。名字空间只是一个 dictionary ,它的键字就是变量名,它的值就是那些变量的值。实际上,名字空间可以象 &python; 的 dictionary 一样进行访问,一会我们就会看到。</para>  
      376 <para>在一个 &python; 程序中的任何一个地方,都存在几个可用的名字空间。每个函数都有着自已的名字空间,叫做局部名字空间,它记录了函数的变量,包括函数的参数和局部定义的变量。每个模块拥有它自已的名字空间,叫做全局名字空间,它记录了模块的变量,包括函数、类、其它导入的模块、模块级的变量和常量。还有就是内置名字空间,任何模块均可访问它,它存放着内置的函数和异常。</para>  
      377 <para>当一行代码要使用变量 <varname>x</varname> 的值时,&python; 会到所有可用的名字空间去查找变量,按照如下顺序:</para>  
    371 378 <orderedlist>  
    372   <listitem><para>local namespace - specific to the current function or class method.  If the function defines a local variable <varname>x</varname>, or has an argument <varname>x</varname>, &python; will use this and stop searching.</para></listitem>  
    373   <listitem><para>global namespace - specific to the current module.  If the module has defined a variable, function, or class called <varname>x</varname>, &python; will use that and stop searching.</para></listitem>  
    374   <listitem><para>built-in namespace - global to all modules.  As a last resort, &python; will assume that <varname>x</varname> is the name of built-in function or variable.</para></listitem>  
      379 <listitem><para>局部名字空间 - 特指当前函数或类的方法。如果函数定义了一个局部变量 <varname>x</varname>, 或一个参数 <varname>x</varname>,&python; 将使用它,然后停止搜索。</para></listitem>  
      380 <listitem><para>全局名字空间 - 特指当前的模块。如果模块定义了一个名为 <varname>x</varname> 的变量,函数或类,&python; 将使用它然后停止搜索。</para></listitem>  
      381 <listitem><para>内置名字空间 - 对每个模块都是全局的。作为最后的尝试,&python; 将假设 <varname>x</varname> 是内置函数或变量。</para></listitem>  
    375 382 </orderedlist>  
    376   <para>If &python; doesn't find <varname>x</varname> in any of these namespaces, it gives up and raises a <errorcode>NameError</errorcode> with the message <errorname>There is no variable named 'x'</errorname>, which you saw back in <xref linkend="odbchelper.unboundvariable"/>, but you didn't appreciate how much work &python; was doing before giving you that error.</para>  
      383 <para>如果 &python; 在这些名字空间找不到 <varname>x</varname>,它将放弃查找并引发一个 <errorcode>NameError</errorcode> 异常,同时传 递 <errorname>There is no variable named 'x'</errorname> 这样一条信息,回到 <xref linkend="odbchelper.unboundvariable"/>,您会看到一路上都有这样的信息。但是您并没有体会到 &python; 在给出这样的错误之前做了多少的努力。</para>  
    376 383 <important>  
    377   <title>Language evolution: nested scopes</title>  
    378   <para>&python; 2.2 introduced a subtle but important change that affects the namespace search order: nested scopes.  In versions of &python; prior to 2.2, when you reference a variable within a <link linkend="fileinfo.nested">nested function</link> or <link linkend="apihelper.lambda">&lambdafunction; function</link>, &python; will search for that variable in the current (nested or &lambdafunction;) function's namespace, then in the module's namespace.  &python; 2.2 will search for the variable in the current (nested or &lambdafunction;) function's namespace, <emphasis>then in the parent function's namespace</emphasis>, then in the module's namespace.  &python; 2.1 can work either way; by default, it works like &python; 2.0, but you can add the following line of code at the top of your module to make your module work like &python; 2.2:</para>  
      384 <title>语言演变: 嵌套的作用域</title>  
      385 <para>&python; 2.2 引入了一种略有不同但重要的改变,它会影响名字空间的搜索顺序: 嵌套的作用域。  
      386  
      387 在 &python; 2.2 版本之前,当您在一个<link linkend="fileinfo.nested">嵌套函数</link> 或 <link linkend="apihelper.lambda">&lambdafunction; 函数</link> 中引用一个变量时,&python; 会在当前 (嵌套的或 &lambdafunction;) 函数的名字空间中搜索,然后在模块的名字空间。&python; 2.2 将只在当前 (嵌套的或 &lambdafunction;) 函数的名字空间中搜索,<emphasis>然后是在父函数的名字空间</emphasis>中搜索,接着是模块的名字空间中搜索。&python; 2.1 可 以两种方式工作,缺省地,按 &python; 2.0 的方式工作。但是您可以把下面一行代码增加到您的模块头部,使您的模块工作起来象 &python; 2.2 的方式:</para>  
    379 388 <programlisting>  
    380 389 from __future__ import nested_scopes</programlisting>  
    381 390 </important>  
    382   <para>Are you confused yet?  Don't despair!  This is really cool, I promise.  Like many things in &python;, namespaces are <emphasis>directly accessible at run-time</emphasis>.  How?  Well, the local namespace is accessible via the built-in &locals; function, and the global (module level) namespace is accessible via the built-in &globals; function.</para>  
      391 <para>您是否为此而感到困惑?  不要绝望!  我敢说这一点非常酷。象 &python; 中的许多事情一样,名字空间 <emphasis>在运行时直接可以访问</emphasis>。怎么样? 不错吧,局部名字空间可以通过内置的 &locals; 函数来访问。全局 (模块级别) 名字空间可以通过内置的 &globals; 函数来访问。</para>  
    382 391 <example>  
    383   <title>Introducing &locals;</title>  
      392 <title>&locals; 介绍</title>  
    383 392 <screen>&prompt;<userinput>def foo(arg):</userinput> <co id="dialect.locals.1.1"/>  
    384 393 &continuationprompt;<userinput>x = 1</userinput>  
     
    394 403 <calloutlist>  
    395 404 <callout arearefs="dialect.locals.1.1">  
    396   <para>The function <function>foo</function> has two variables in its local namespace: <varname>arg</varname>, whose value is passed in to the function, and <varname>x</varname>, which is defined within the function.</para>  
      405 <para>函数 <function>foo</function> 在它的局部名字空间中有两个变量: <varname>arg</varname>,它的值是被传入函数的,和 <varname>x</varname>, 它是在函数里定义的。</para>  
    396 405 </callout>  
    397 406 <callout arearefs="dialect.locals.1.2">  
    398   <para>&locals; returns a dictionary of name/value pairs.  The keys of this dictionary are the names of the variables as strings; the values of the dictionary are the actual values of the variables.  So calling <function>foo</function> with <literal>7</literal> prints the dictionary containing the function's two local variables: <varname>arg</varname> (<literal>7</literal>) and <varname>x</varname> (&one;).</para>  
      407 <para>&locals; 返回一个名字/值对的 dictionary。这个 dictionary 的键字是字符串形式的变量名字,dictionary 的值是变量的实际值。所以用 <literal>7</literal> 来调用 <function>foo</function>,会打印出包含函数两个局部变量的 dictionary: <varname>arg</varname> (<literal>7</literal>) 和 <varname>x</varname> (&one;)。</para>  
    398 407 </callout>  
    399 408 <callout arearefs="dialect.locals.1.3">  
    400   <para>Remember, &python; has dynamic typing, so you could just as easily pass a string in for <varname>arg</varname>; the function (and the call to &locals;) would still work just as well.  &locals; works with all variables of all datatypes.</para>  
      409 <para>回想一下,&python; 有动态数据类型,所以您可以非常容易地传递给 <varname>arg</varname> 一个字符串,这个函数 (和对 &locals; 的调用) 将仍然很好的工作。&locals; 可以用于所有类型的变量。</para>  
    400 409 </callout>  
    401 410 </calloutlist>  
    402 411 </example>  
    403   <para>What &locals; does for the local (function) namespace, &globals; does for the global (module) namespace.  &globals; is more exciting, though, because a module's namespace is more exciting.<footnote><para>I don't get out much.</para></footnote>  Not only does the module's namespace include module-level variables and constants, it includes all the functions and classes defined in the module.  Plus, it includes anything that was imported into the module.</para>  
    404   <para>Remember the difference between <link linkend="fileinfo.fromimport">&frommoduleimport;</link> and <link linkend="odbchelper.import">&importmodule;</link>?  With &importmodule;, the module itself is imported, but it retains its own namespace, which is why you need to use the module name to access any of its functions or attributes: <literal><replaceable>module</replaceable>.<replaceable>function</replaceable></literal>.  But with &frommoduleimport;, you're actually importing specific functions and attributes from another module into your own namespace, which is why you access them directly without referencing the original module they came from.  With the &globals; function, you can actually see this happen.</para>  
      412 <para>&locals; 对局部 (函数) 名字空间做了些什么,&globals; 就对全局 (模块) 名字空间做了什么。然而 &globals; 更令人兴奋,因为一个模块的名字空间是更令人兴奋的。<footnote><para>我没有说得太多吧。</para></footnote>  不仅仅是模块的名字空间包含了模块级的变量和常量,它还包括了所有在模块中定义的函数和类。再加上,它包括了任何被导入到模块中的东西。</para>  
      413 <para>回想一下 <link linkend="fileinfo.fromimport">&frommoduleimport;</link> 和 <link linkend="odbchelper.import">&importmodule;</link> 之间的不同。使用 &importmodule;,模块自身被导入,但是它保持着自已的名字空间,这就是为什么您需要使用模块名来访问它的函数或属性: <literal><replaceable>module</replaceable>.<replaceable>function</replaceable></literal> 的原因。但是使用 &frommoduleimport;,实际上是从另一个模块中将指定的函数和属性导入到您自己的名字空间,这就是为什么您可以直接访问它们却不需要引用它们所来源的模块的原因。使用 &globals; 函数,您会真切地看到这一切的发生。</para>  
    405 414 <example id="dialect.globals.example">  
    406   <title>Introducing &globals;</title>  
    407   <para>Look at the following block of code at the bottom of &basehtml_filename;:</para>  
      415 <title>&globals; 介绍</title>  
      416 <para>看看下面列出的在文件 &basehtml_filename; 尾部的代码块:</para>  
    408 417 <programlisting>  
    409 418 if __name__ == "__main__":  
     
    415 424 <calloutlist>  
    416 425 <callout arearefs="dialect.locals.2.1">  
    417   <para>Just so you don't get intimidated, remember that you've seen all this before.  The &globals; function returns a dictionary, and you're <link linkend="dictionaryiter.example">iterating through the dictionary</link> using the &items; method and <link linkend="odbchelper.multiassign">multi-variable assignment</link>.  The only thing new here is the &globals; function.</para>  
      426 <para>不要被吓坏了,想想以前您已经全部都看到过了。&globals; 函数返回一个 dictionary,我们使用 &items; 方法和<link linkend="odbchelper.multiassign">多变量赋值</link>来<link linkend="dictionaryiter.example">遍历 dictionary</link>。在这里唯一的新东西就是 &globals;  函数。</para>  
    417 426 </callout>  
    418 427 </calloutlist>  
    419   <para>Now running the script from the command line gives this output (note that your output may be slightly different, depending on your platform and where you installed &python;):</para>  
      428 <para>现在从命令行运行这个脚本会得到下面的输出 (注意您的输出可能有略微的不同, 这依赖于您的系统平台和所安装的 &python; 版本):</para>  
    419 428 <screen><prompt>c:\docbook\dip\py></prompt> <userinput>python BaseHTMLProcessor.py</userinput></screen>  
    420 429 <programlisting>  
     
    425 434 BaseHTMLProcessor = __main__.BaseHTMLProcessor <co id="dialect.locals.3.3"/>  
    426 435 __name__ = __main__                            <co id="dialect.locals.3.4"/>  
    427   ... rest of output omitted for brevity...</programlisting>  
      436 ......</programlisting>  
    427 436 <calloutlist>  
    428 437 <callout arearefs="dialect.locals.3.1">  
    429   <para>&sgmlparser; was imported from &sgmllib_modulename;, using &frommoduleimport;.  That means that it was imported directly into the module's namespace, and here it is.</para>  
      438 <para>&sgmlparser; 使用了 &frommoduleimport; 从 &sgmllib_modulename; 中被导入。也就是说它被直接导入到我们的模块名字空间了,就是这样。</para>  
    429 438 </callout>  
    430 439 <callout arearefs="dialect.locals.3.2">  
    431   <para>Contrast this with &htmlentitydefs;, which was imported using &import;.  That means that the &htmlentitydefs; module itself is in the namespace, but the <varname>entitydefs</varname> variable defined within &htmlentitydefs; is not.</para>  
      440 <para>对比这个和 &htmlentitydefs;, 它是用 &import; 被导入的。  也就是说 &htmlentitydefs; 模块本身也在名字空间中, 但是 <varname>entitydefs</varname> 变量定义在 &htmlentitydefs; 之外。</para>  
    431 440 </callout>  
    432 441 <callout arearefs="dialect.locals.3.3">  
    433   <para>This module only defines one class, &basehtml_classname;, and here it is.  Note that the value here is <link linkend="fileinfo.classattributes.intro">the class itself</link>, not a specific instance of the class.</para>  
      442 <para>这个模块只定义一个类, &basehtml_classname;, 不错。 注意这儿的值就是<link linkend="fileinfo.classattributes.intro">类本身</link>,不是一个特别的类实例。</para>  
    433 442 </callout>  
    434 443 <callout arearefs="dialect.locals.3.4">  
    435   <para>Remember the <link linkend="odbchelper.ifnametrick"><literal>if &name;</literal> trick</link>?  When running a module (as opposed to importing it from another module), the built-in &name; attribute is a special value, &main;.  Since you ran this module as a script from the command line, &name; is &main;, which is why the little test code to print the &globals; got executed.</para>  
      444 <para>记得  <link linkend="odbchelper.ifnametrick"><literal>if &name;</literal> 技巧</link> 吗?当运行一个模块时 (对从另外一个模块中导入而言) ,内置的 &name; 是一个特殊值 &main;。因为我们是把这个模块当作脚本从命令来运行的,故 &name; 值为 &main;,这就是为什么我们这段简单地打印 &globals; 的代码可以执行的原因。</para>  
    435 444 </callout>  
    436 445 </calloutlist>  
    437 446 </example>  
    438 447 <note id="tip.localsbyname">  
    439   <title>Accessing variables dynamically</title>  
    440   <para>Using the &locals; and &globals; functions, you can get the value of arbitrary variables dynamically, providing the variable name as a string.  This mirrors the functionality of the <link linkend="apihelper.getattr">&getattr;</link> function, which allows you to access arbitrary functions dynamically by providing the function name as a string.</para>  
      448 <title>变量的动态访问</title>  
      449 <para>使用 &locals; 和 &globals; 函数,通过提供变量的字符串名字您可以动态地得到任何变量的值。这种方法提供了这样的功能: <link linkend="apihelper.getattr">&getattr;</link> 函数允许您通过提供函数的字符串名来动态地访问任意的函数。</para>  
    441 450 </note>  
    442   <para>There is one other important difference between the &locals; and &globals; functions, which you should learn now before it bites you.  It will bite you anyway, but at least then you'll remember learning it.</para>  
      451 <para>在 &locals; 与 &globals; 之间有另外一个重要的区别,您应该在它困扰您之前就了解它。它无论如何都会困扰您的,但至少您还记得了解过它。</para>  
    442 451 <example id="dialect.locals.readonly.example">  
    443   <title>&locals; is read-only, &globals; is not</title>  
      452 <title>&locals; 是只读的, &globals; 不是</title>  
    443 452 <programlisting>  
    444 453 def foo(arg):  
     
    463 472 <calloutlist>  
    464 473 <callout arearefs="dialect.locals.4.1">  
    465   <para>Since <function>foo</function> is called with <literal>3</literal>, this will print <literal>{'arg': 3, 'x': 1}</literal>.  This should not be a surprise.</para>  
      474 <para>因为使用 <literal>3</literal> 来调用 <function>foo</function>,会打印出 <literal>{'arg': 3, 'x': 1}</literal>。这个应该没什么奇怪的。</para>  
    465 474 </callout>  
    466 475 <callout arearefs="dialect.locals.4.2">  
    467   <para>&locals; is a function that returns a dictionary, and here you are setting a value in that dictionary.  You might think that this would change the value of the local variable <varname>x</varname> to <literal>2</literal>, but it doesn't.  &locals; does not actually return the local namespace, it returns a copy.  So changing it does nothing to the value of the variables in the local namespace.</para>  
      476 <para>&locals; 是一个返回 dictionary 的函数, 并且在 dictionary 中设置一个值。您可能认为这样会改变局部变量 <varname>x</varname> 的值为 <literal>2</literal>,但并不会。&locals; 实际上没有返回局部名字空间,它返回的是一个拷贝。所以对它进行改变对局部名字空间中的变量值并无影响。</para>  
    467 476 </callout>  
    468 477 <callout arearefs="dialect.locals.4.3">  
    469   <para>This prints <literal>x= 1</literal>, not <literal>x= 2</literal>.</para>  
      478 <para>这样会打印出 <literal>x= 1</literal>,而不是 <literal>x= 2</literal>。</para>  
    469 478 </callout>  
    470 479 <callout arearefs="dialect.locals.4.4">  
    471   <para>After being burned by &locals;, you might think that this <emphasis>wouldn't</emphasis> change the value of <varname>z</varname>, but it does.  Due to internal differences in how &python; is implemented (which I'd rather not go into, since I don't fully understand them myself), &globals; returns the actual global namespace, not a copy: the exact opposite behavior of &locals;.  So any changes to the dictionary returned by &globals; directly affect your global variables.</para>  
      480 <para>在有了对 &locals; 的经验之后,您可能认为这样 <emphasis>不会</emphasis> 改变 <varname>z</varname> 的值,但是可以。由于 &python; 在实现过程中内部有所区别 (关于这些区别我宁可不去研究,因为我自已还没有完全理解) ,&globals; 返回实际的全局名字空间,而不是一个拷贝: 与 locals 的行为完全相反。所以对 globals 所返回的 dictionary 的任何的改动都会直接影响到全局变量。</para>  
    471 480 </callout>  
    472 481 <callout arearefs="dialect.locals.4.5">  
    473   <para>This prints <literal>z= 8</literal>, not <literal>z= 7</literal>.</para>  
      482 <para>这样会打印出 <literal>z= 8</literal>,而不是 <literal>z= 7</literal>。</para>  
    473 482 </callout>  
    474 483 </calloutlist>  
     
    482 491 <section id="dialect.dictsub">  
    483 492 <?dbhtml filename="html_processing/dictionary_based_string_formatting.html"?>  
    484   <title>Dictionary-based string formatting</title>  
    485   <para>Why did you learn about &locals; and &globals;?  So you can learn about dictionary-based string formatting.  As you recall, <link linkend="odbchelper.stringformatting">regular string formatting</link> provides an easy way to insert values into strings.  Values are listed in a tuple and inserted in order into the string in place of each formatting marker.  While this is efficient, it is not always the easiest code to read, especially when multiple values are being inserted.  You can't simply scan through the string in one pass and understand what the result will be; you're constantly switching between reading the string and reading the tuple of values.</para>  
      493 <title>基于 dictionary 的字符串格式化</title>  
      494 <para>为什么学习 &locals; 和 &globals;?  因为接下来就可以学习关于基于 dictionary 的字符串格式化。或许您还能记起, <link linkend="odbchelper.stringformatting">字符串格式化</link>  提供了一种将值插入字符串中的一种便捷的方法。值被列在一个 tuple 中, 按照顺序插入到字符串中每个格式化标记所在的位置上。尽管这种做法效率高,但还不是最容易阅读的代码,特别是当插入多个值的时候。仅用眼看一遍字符串,您不能马上就明白结果是什么;您需要经常地在字符串和值的 tuple 之间进行反复查看。</para>  
    486 495 <abstract>  
    487 496 <title/>  
    488   <para>There is an alternative form of string formatting that uses dictionaries instead of tuples of values.</para>  
      497 <para>有另外一种字符串格式化的形式,它使用 dictionary 而不是值的 tuple。</para>  
    488 497 </abstract>  
    489 498 <example>  
    490   <title>Introducing dictionary-based string formatting</title>  
      499 <title>基于 dictionary 的字符串格式化介绍</title>  
    490 499 <screen>  
    491 500 &prompt;<userinput>params = {"server":"mpilgrim", "database":"master", "uid":"sa", "pwd":"secret"}</userinput>  
     
    500 509 <calloutlist>  
    501 510 <callout arearefs="dialect.dictsub.1.1">  
    502   <para>Instead of a tuple of explicit values, this form of string formatting uses a dictionary, <varname>params</varname>.  And instead of a simple <literal>&pct;s</literal> marker in the string, the marker contains a name in parentheses.  This name is used as a key in the <varname>params</varname> dictionary and subsitutes the corresponding value, <literal>secret</literal>, in place of the <literal>&pct;(pwd)s</literal> marker.</para>  
      511 <para>这种字符串格式化形式不用显示的值的 tuple,而是使用一个 dictionary,<varname>params</varname>。并且标记也不是在字符串中的一个简单 <literal>&pct;s</literal>,而是包含了一个用括号包围起来的名字。这个名字是 <varname>params</varname> dictionary 中的一个键字,并且将 <literal>&pct;(pwd)s</literal> 标记所在的地方替换成相应的值 <literal>secret</literal>。</para>  
    502 511 </callout>  
    503 512 <callout arearefs="dialect.dictsub.1.2">  
    504   <para>Dictionary-based string formatting works with any number of named keys.  Each key must exist in the given dictionary, or the formatting will fail with a <errorcode>KeyError</errorcode>.</para>  
      513 <para>基于 dictionary 的字符串格式化可用于任意数量的有名的键字。每个键字必须在一个给定的 dictionary 中存在,否则这个格式化操作将失败并引发一个 <errorcode>KeyError</errorcode> 的异常。</para>  
    504 513 </callout>  
    505 514 <callout arearefs="dialect.dictsub.1.3">  
    506   <para>You can even specify the same key twice; each occurrence will be replaced with the same value.</para>  
      515 <para>您甚至可以两次指定同一键字,每个键字发生之处将被同一个值所替换。</para>  
    506 515 </callout>  
    507 516 </calloutlist>  
    508 517 </example>  
    509   <para>So why would you use dictionary-based string formatting?  Well, it does seem like overkill to set up a dictionary of keys and values simply to do string formatting in the next line; it's really most useful when you happen to have a dictionary of meaningful keys and values already.  Like <link linkend="dialect.locals">&locals;</link>.</para>  
      518 <para>那么为什么您偏要使用基于 dictionary 的字符串格式化呢?好,在下面一行中,仅为了进行字符串格式化就需要创建一个有键字和值的 dictionary 看上去的确有些小题大作。它的真正最大用处是当您碰巧已经有了象 <link linkend="dialect.locals">&locals;</link> 一样的有意义的键字和值的 dictionary 的时候。</para>  
    509 518 <example id="dialect.unknownstarttag">  
    510   <title>Dictionary-based string formatting in &basehtml_filename;</title>  
      519 <title>&basehtml_filename; 中的基于 dictionary 的字符串格式化</title>  
    510 519 <programlisting>  
    511 520 &basehtml_commentdef;  
     
    519 528 <calloutlist>  
    520 529 <callout arearefs="dialect.dictsub.2.1">  
    521   <para>Using the built-in &locals; function is the most common use of dictionary-based string formatting.  It means that you can use the names of local variables within your string (in this case, <varname>text</varname>, which was passed to the class method as an argument) and each named variable will be replaced by its value.  If <varname>text</varname> is <literal>'Begin page footer'</literal>, the string formatting <literal>"&lt;!--&pct;(text)s-->" &pct; locals()</literal> will resolve to the string <literal>'&lt;!--Begin page footer-->'</literal>.</para>  
      530 <para>使用内置的 locals 函数是最普通的基于 dictionary 的字符串格式化的应用。这就是说您可以在您的字符串 (本例中是 <varname>text</varname>,它作为一个参数传递给类方法) 中使用局部变量的名字,并且每个命名的变量将会被它的值替换。如果 <varname>text</varname> 是 <literal>'Begin page footer'</literal>,字符串格式化 <literal>"&lt;!--&pct;(text)s-->" &pct; locals()</literal> 将得到字符串 <literal>'&lt;!--Begin page footer-->'</literal>。</para>  
    521 530 </callout>  
    522 531 </calloutlist>  
    523 532 </example>  
    524 533 <example>  
    525   <title>More dictionary-based string formatting</title>  
      534 <title>基于 dictionary 的字符串格式化的更多内容</title>  
    525 534 <programlisting>  
    526 535 &basehtml_starttagdef;  
    527 536 &basehtml_starttagjoin; <co id="dialect.dictsub.3.1"/>  
    528   &basehtml_starttagcode;                      <co id="dialect.dictsub.3.2"/>  
      537 &basehtml_starttagcode;                       <co id="dialect.dictsub.3.2"/>  
    528 537 </programlisting>  
    529 538 <calloutlist>  
    530 539 <callout arearefs="dialect.dictsub.3.1">  
    531   <para>When this method is called, <varname>attrs</varname> is a list of key/value tuples, just like the <link linkend="odbchelper.items">&items; of a dictionary</link>, which means you can use <link linkend="odbchelper.multiassign">multi-variable assignment</link> to iterate through it.  This should be a familiar pattern by now, but there's a lot going on here, so let's break it down:</para>  
      540 <para>当这个模块被调用时,<varname>attrs</varname> 是一个键/值 tuple 的 list,就象一个 <link linkend="odbchelper.items">dictionary 的 &items;</link>。这就意味着我们可以使用 <link linkend="odbchelper.multiassign">多变量赋值</link> 来遍历它。到现在这将是一种熟悉的模式,但是这里有很多东西,让我们分开来看:</para>  
    531 540 <orderedlist numeration="loweralpha">  
    532   <listitem><para>Suppose <varname>attrs</varname> is <literal>[('href', 'index.html'), ('title', 'Go to home page')]</literal>.</para></listitem>  
    533   <listitem><para>In the first round of the list comprehension, <varname>key</varname> will get <literal>'href'</literal>, and <varname>value</varname> will get <literal>'index.html'</literal>.</para></listitem>  
    534   <listitem><para>The string formatting <literal>'&nbsp;&pct;s="&pct;s"' &pct; (key, value)</literal> will resolve to <literal>'&nbsp;href="index.html"'</literal>.  This string becomes the first element of the list comprehension's return value.</para></listitem>  
    535   <listitem><para>In the second round, <varname>key</varname> will get <literal>'title'</literal>, and <varname>value</varname> will get <literal>'Go to home page'</literal>.</para></listitem>  
    536   <listitem><para>The string formatting will resolve to <literal>' title="Go to home page"'</literal>.</para></listitem>  
    537   <listitem><para>The list comprehension returns a list of these two resolved strings, and <varname>strattrs</varname> will join both elements of this list together to form <literal>'&nbsp;href="index.html" title="Go to home page"'</literal>.</para></listitem>  
      541 <listitem><para>假设 <varname>attrs</varname> 是 <literal>[('href', 'index.html'), ('title', 'Go to home page')]</literal>。 </para></listitem>  
      542 <listitem><para>在这个列表理解的第一轮循环中,<varname>key</varname> 将为 <literal>'href'</literal>,<varname>value</varname> 将为 <literal>'index.html'</literal>。</para></listitem>  
      543 <listitem><para>字符串格式化 <literal>'&nbsp;&pct;s="&pct;s"' &pct; (key, value)</literal> 将生成 <literal>'&nbsp;href="index.html"'</literal>。这个字符串就作为这个列表理解返回值的第一个元素。</para></listitem>  
      544 <listitem><para>在第二轮中,<varname>key</varname> 将为 <literal>'title'</literal>,<varname>value</varname> 将为 <literal>'Go to home page'</literal>。</para></listitem>  
      545 <listitem><para>字符串格式化将生成 <literal>' title="Go to home page"'</literal>。</para></listitem>  
      546 <listitem><para>这个 list 理解返回两个生成的字符串 list,并且 <varname>strattrs</varname> 将把这个 list 的两个元素连接在一起形成 <literal>'&nbsp;href="index.html" title="Go to home page"'</literal>。</para></listitem>  
    538 547 </orderedlist>  
    539 548 </callout>  
    540 549 <callout arearefs="dialect.dictsub.3.2">  
    541   <para>Now, using dictionary-based string formatting, you insert the value of <varname>tag</varname> and <varname>strattrs</varname> into a string.  So if <varname>tag</varname> is <literal>'a'</literal>, the final result would be <literal>'&lt;a href="index.html" title="Go to home page">'</literal>, and that is what gets appended to &selfpieces;.</para>  
      550 <para>现在,使用基于 dictionary 的字符串格式化,我们将 <varname>tag</varname> 和 <varname>strattrs</varname> 的值插入到一个字符串中。所以,如果 <varname>tag</varname> 是 <literal>'a'</literal>,最终的结果会是 <literal>'&lt;a href="index.html" title="Go to home page">'</literal>,并且这就是追加到 &selfpieces; 后面的东西。</para>  
    541 550 </callout>  
    542 551 </calloutlist>  
    543 552 </example>  
    544 553 <important>  
    545   <title>Performance issues with &locals;</title>  
    546   <para>Using dictionary-based string formatting with &locals; is a convenient way of making complex string formatting expressions more readable, but it comes with a price.  There is a slight performance hit in making the call to &locals;, since <link linkend="dialect.locals.readonly.example">&locals; builds a copy</link> of the local namespace.</para>  
      554 <title>使用 &locals; 的性能问题</title>  
      555 <para>使用 &locals; 来应用基于 dictionary 的字符串格式化是一种方便的作法,它可以使复杂的字符串格式化表达式更易读。但它需要花费一定的代价。在调用 &locals; 方面有一点性能上的问题,这是由于 <link linkend="dialect.locals.readonly.example"> &locals; 创建了局部名字空间的一个拷贝 </link> 引起的。</para>  
    547 556 </important>  
    548 557 </section>  
    549 558 <section id="dialect.quoting">  
    550 559 <?dbhtml filename="html_processing/quoting_attribute_values.html"?>  
    551   <title>Quoting attribute values</title>  
      560 <title>给属性值加引号</title>  
    551 560 <abstract>  
    552   <para>A common question on &clp; is <quote>I have a bunch of &html; documents with unquoted attribute values, and I want to properly quote them all.  How can I do this?</quote><footnote><para>All right, it's not that common a question.  It's not up there with <quote>What editor should I use to write &python; code?</quote> (answer: &emacs;) or <quote>Is &python; better or worse than &perl;?</quote> (answer: <quote>&perl; is worse than &python; because people wanted it worse.</quote> -Larry Wall, 10/14/1998)  But questions about &html; processing pop up in one form or another about once a month, and among those questions, this is a popular one.</para></footnote>  (This is generally precipitated by a project manager who has found the &html;-is-a-standard religion joining a large project and proclaiming that all pages must validate against an &html; validator.  Unquoted attribute values are a common violation of the &html; standard.)  Whatever the reason, unquoted attribute values are easy to fix by feeding &html; through &basehtml_classname;.</para>  
      561 <para>在 &clp; 上的一个常见问题是 <quote>我有一些 &html; 文档,属性值没有用引号括起来,并且我想将它们全部括起来,我怎么才能实现它呢?</quote>  
      562 <footnote><para>好吧,其实并不是那么普通的一个问题。在那不都是问 <quote>我应该用何种编辑器来写 &python; 代码?</quote> (回答: &emacs;) 或 <quote>&python; 比 &perl; 是好还是坏?</quote> (回答: <quote>&perl; 比 &python; 差,因为人们想让它差的。</quote> -Larry Wall,1998年10月14日) 但是关于 &html;  处理的问题,或者这种提法或者另一种提法,大约一个月就要出现一次,在这些问题之中,这个问题是最常见的一个。</para></footnote>  (一般这种事情的出现是由于一个项目经理加入到一个大的项目中来,而他又抱着 &html; 是一种标记语言的教条,要求所有的页面必须能够通过 &html; 校验器的验证。而属性值没有被引号括起来是一种常见的对 &html; 规范的违反。) 不管什么原因,未括起来的属性值通过将 &html; 送进 &basehtml_classname; 可以容易地修复。  
      563 </para>  
    553 564 </abstract>  
    554   <para>&basehtml_classname; consumes &html; (since it's descended from &sgmlparser;) and produces equivalent &html;, but the &html; output is not identical to the input.  Tags and attribute names will end up in lowercase, even if they started in uppercase or mixed case, and attribute values will be enclosed in double quotes, even if they started in single quotes or with no quotes at all.  It is this last side effect that you can take advantage of.</para>  
      565 <para>&basehtml_classname; 消费 (consume) &html;  (因为它是从 &sgmlparser; 派生来的) 并生成等价的 &html;。但是这个 &html; 输出与输入的并不一样。标记和属性名最终会转化为小写字母,即使它们可能以大写字母开始或是大小写的混和形式。属性值将被双引号引起来,即使它们原来可能是用单引号括起来的或根本没有括起来。这就是最后我们可以受益的边际效应。</para>  
    554 565 <example id="dialect.quoting.example">  
    555 566 <title>Quoting attribute values</title>  
     
    592 603 <calloutlist>  
    593 604 <callout arearefs="dialect.basehtml.3.1">  
    594   <para>Note that the attribute values of the <literal>href</literal> attributes in the <sgmltag>&lt;a&gt;</sgmltag> tags are not properly quoted.  (Also note that you're using <link linkend="odbchelper.triplequotes">triple quotes</link> for something other than a &docstring;.  And directly in the &ide;, no less.  They're very useful.)</para>  
      605 <para>请注意,在 <sgmltag>&lt;a&gt;</sgmltag> 标记中的 <literal>href</literal> 属性值没有被适当的括起来 (还要注意,除了文档字符串之外,我们还将 <link linkend="odbchelper.triplequotes">三重引号</link> 用到了 &docstring; 之外的其它地方,并且是不会少于直接在 &ide; 中的使用。它们非常有用。) </para>  
    594 605 </callout>  
    595 606 <callout arearefs="dialect.basehtml.3.2">  
    596   <para>Feed the parser.</para>  
      607 <para>装填分析器。</para>  
    596 607 </callout>  
    597 608 <callout arearefs="dialect.basehtml.3.3">  
    598   <para>Using the <function>output</function> function defined in &basehtml_classname;, you get the output as a single string, complete with quoted attribute values.  While this may seem anti-climactic, think about how much has actually happened here: &sgmlparser; parsed the entire &html; document, breaking it down into tags, refs, data, and so forth; &basehtml_classname; used those elements to reconstruct pieces of &html; (which are still stored in <varname>parser.pieces</varname>, if you want to see them); finally, you called <function>parser.output</function>, which joined all the pieces of &html; into one string.</para>  
      609 <para>使用定义在 &basehtml_classname; 中的 <function>output</function> 函数,我们得到单个字符串的输出,并且属性值被完全括起来了。让我们想一下这里实际上发生了多少事: &sgmlparser; 分析整个 &html; 文档,将其分解为一片片的标记、引用、数据等等。&basehtml_classname; 使用这些元素来重新构造 &html; 的片段 (如果您想查看的话它们仍然保存在 <varname>parser.pieces</varname> 中) 。最后,我们调用 <function>parser.output</function>,它将所有的 &html; 片段连接成一个字符串。</para>  
    598 609 </callout>  
    599 610 </calloutlist>  
     
    605 616 <section id="dialect.dialectizer">  
    606 617 <?dbhtml filename="html_processing/dialect.html"?>  
    607   <title>Introducing &dialect_filename;</title>  
      618 <title>&dialect_filename; 介绍</title>  
    607 618 <abstract>  
    608 619 <title/>  
    609   <para>&dialect_classname; is a simple (and silly) descendant of &basehtml_classname;.  It runs blocks of text through a series of substitutions, but it makes sure that anything within a <literal>&pre_starttag;...&pre_endtag;</literal> block passes through unaltered.</para>  
      620 <para>&dialect_classname; 是 &basehtml_classname; 的简单 (和拙劣) 的派生类。它通过一系列的替换对文本块进行了处理,但是它确保在 <literal>&pre_starttag;...&pre_endtag;</literal> 块之间的任何东西不被修改地通过。</para>  
    609 620 </abstract>  
    610   <para>To handle the &pre_starttag; blocks, you define two methods in &dialect_classname;: <function>start_pre</function> and <function>end_pre</function>.</para>  
      621 <para>为了处理 &pre_starttag; 块,我们在 &dialect_classname; 中定义了两个方法: <function>start_pre</function> 和 <function>end_pre</function>。</para>  
    610 621 <example id="dialect.specifictags.example">  
    611   <title>Handling specific tags</title>  
      622 <title>处理特别标记</title>  
    611 622 <programlisting>  
    612 623 &dialect_startpredef; <co id="dialect.dialectizer.1.1"/>  
     
    623 634 <calloutlist>  
    624 635 <callout arearefs="dialect.dialectizer.1.1">  
    625   <para><function>start_pre</function> is called every time &sgmlparser; finds a &pre_starttag; tag in the &html; source.  (In a minute, you'll see exactly how this happens.)  The method takes a single parameter, <varname>attrs</varname>, which contains the attributes of the tag (if any).  <varname>attrs</varname> is a list of key/value tuples, just like <link linkend="dialect.unknownstarttag">&unknown_starttag;</link> takes.</para>  
      636 <para>每次 &sgmlparser; 在 &html; 源代码中发现一个 &pre_starttag; 时,都会调用 <function>start_pre</function>。 (马上我们就会确切地看到它是如何发生的。) 这个方法使用单个参数: <varname>attrs</varname>,这个参数会包含标记的属性 (如果存在的话) 。 <varname>attrs</varname> 是一个键/值 tuple 的 list,就象 <link linkend="dialect.unknownstarttag">&unknown_starttag;</link> 中所使用的。</para>  
    625 636 </callout>  
    626 637 <callout arearefs="dialect.dialectizer.1.2">  
    627   <para>In the <function>reset</function> method, you initialize a data attribute that serves as a counter for &pre_starttag; tags.  Every time you hit a &pre_starttag; tag, you increment the counter; every time you hit a &pre_endtag; tag, you'll decrement the counter.  (You could just use this as a flag and set it to &one; and reset it to &zero;, but it's just as easy to do it this way, and this handles the odd (but possible) case of nested &pre_starttag; tags.)  In a minute, you'll see how this counter is put to good use.</para>  
      638 <para>在 <function>reset</function> 方法中,我们初始化了一个数据属性,它作为 &pre_starttag; 标记的一个计数器。每次我们找到一个 &pre_starttag; 标记,我们增加计数器的值;每次我们找到一个 &pre_endtag; 标记,我们将减少计数器的值。 (我们可以将它作为一个标志,并且把它设为 &one; 或重置为 &zero;,但是这样做只是为了方便,并且这样做可以处理古怪 (但有可能) 的 &pre_starttag; 标记嵌套的情况。) 马上我们将会看到这个计数器是多么的好用。</para>  
    627 638 </callout>  
    628 639 <callout arearefs="dialect.dialectizer.1.3">  
    629   <para>That's it, that's the only special processing you do for &pre_starttag; tags.  Now you pass the list of attributes along to &unknown_starttag; so it can do the default processing.</para>  
      640 <para>不错,这就是我们对 &pre_starttag; 标记所做的唯一的特殊处理。现在我们将属性列表传给 &unknown_starttag;,由它来进行缺省的处理。</para>  
    629 640 </callout>  
    630 641 <callout arearefs="dialect.dialectizer.1.4">  
    631   <para><function>end_pre</function> is called every time &sgmlparser; finds a &pre_endtag; tag.  Since end tags can not contain attributes, the method takes no parameters.</para>  
      642 <para>每次 &sgmlparser; 找到一个 &pre_endtag; 标记时会调用 <function>end_pre</function>。因为结束标记不能包含属性,因此这个方法没有参数。</para>  
    631 642 </callout>  
    632 643 <callout arearefs="dialect.dialectizer.1.5">  
    633   <para>First, you want to do the default processing, just like any other end tag.</para>  
      644 <para>首先我们要进行缺省处理,就象其它结束标记做的一样。</para>  
    633 644 </callout>  
    634 645 <callout arearefs="dialect.dialectizer.1.6">  
    635   <para>Second, you decrement your counter to signal that this &pre_starttag; block has been closed.</para>  
      646 <para>其次我们将计数器减少,标记这个 &pre_starttag; 块已经被关闭了。</para>  
    635 646 </callout>  
    636 647 </calloutlist>  
    637 648 </example>  
    638   <para>At this point, it's worth digging a little further into &sgmlparser;.  I've claimed repeatedly (and you've taken it on faith so far) that &sgmlparser; looks for and calls specific methods for each tag, if they exist.  For instance, you just saw the definition of <function>start_pre</function> and <function>end_pre</function> to handle &pre_starttag; and &pre_endtag;.  But how does this happen?  Well, it's not magic, it's just good &python; coding.</para>  
      649 <para>到了这个地方,有必要对 &sgmlparser; 更深入一层。我已经多次声明 (到目前为止您应已经把它做为信条了) ,就是 &sgmlparser; 查找每一个标记并且如果存在特定的方法就调用它们。例如: 我们刚刚看到处理 &pre_starttag; 和 &pre_endtag; 的 <function>start_pre</function> 和 <function>end_pre</function> 的定义。但这是如何发生的呢?嗯,也没什么神奇的,只不过是出色的 &python; 编码。</para>  
    638 649 <example id="dialect.dialectizer.example">  
    639 650 <title>&sgmlparser;</title>  
     
    667 678 <calloutlist>  
    668 679 <callout arearefs="dialect.dialectizer.2.1">  
    669   <para>At this point, &sgmlparser; has already found a start tag and parsed the attribute list.  The only thing left to do is figure out whether there is a specific handler method for this tag, or whether you should fall back on the default method (&unknown_starttag;).</para>  
      680 <para>此处,&sgmlparser; 已经找到了一个开始标记,并且分析出属性列表。唯一要做的事情就是找到对于这个标记是否存在一个特别的处理方法,或者是否我们应该求助于缺省方法 (&unknown_starttag;) 。</para>  
    669 680 </callout>  
    670 681 <callout arearefs="dialect.dialectizer.2.2">  
    671   <para>The <quote>magic</quote> of &sgmlparser; is nothing more than your old friend, <link linkend="apihelper.getattr">&getattr;</link>.  What you may not have realized before is that &getattr; will find methods defined in descendants of an object as well as the object itself.  Here the object is &self;, the current instance.  So if <varname>tag</varname> is <literal>'pre'</literal>, this call to &getattr; will look for a <function>start_pre</function> method on the current instance, which is an instance of the &dialect_classname; class.</para>  
      682 <para>&sgmlparser; 的 <quote>神奇</quote> 之处除了我们的老朋友 <link linkend="apihelper.getattr">&getattr;</link> 之外就没有什么了。您以前可能还没注意到的是 &getattr; 将查找定义在一个对象的继承者中或对象自身的方法。这里对象是 &self;,即当前实例。所以,如果 <varname>tag</varname> 是 <literal>'pre'</literal>,这里对 &getattr; 的调用将会在当前实例 (它是 &dialect_classname; 类的一个实例) 中查找一个名为 <function>start_pre</function> 的方法。</para>  
    671 682 </callout>  
    672 683 <callout arearefs="dialect.dialectizer.2.3">  
    673   <para>&getattr; raises an &attributeerror; if the method it's looking for doesn't exist in the object (or any of its descendants), but that's okay, because you wrapped the call to &getattr; inside a <link linkend="fileinfo.exception">&tryexcept;</link> block and explicitly caught the &attributeerror;.</para>  
      684 <para>如果 &getattr; 所查找的方法在对象或它的任何继承者中不存在的话,它会引发一个 &attributeerror; 的异常。但没有关系,因为我们把对 &getattr; 的调用包装到一个 <link linkend="fileinfo.exception">&tryexcept;</link> 块中了,并且显示地捕捉 &attributeerror; 异常。</para>  
    673 684 </callout>  
    674 685 <callout arearefs="dialect.dialectizer.2.4">  
    675   <para>Since you didn't find a <function>start_xxx</function> method, you'll also look for a <function>do_xxx</function> method before giving up.  This alternate naming scheme is generally used for standalone tags, like <sgmltag>&lt;br></sgmltag>, which have no corresponding end tag.  But you can use either naming scheme; as you can see, &sgmlparser; tries both for every tag.  (You shouldn't define both a <function>start_xxx</function> and <function>do_xxx</function> handler method for the same tag, though; only the <function>start_xxx</function> method will get called.)</para>  
      686 <para>因为我们没有找到一个 <function>start_xxx</function> 方法,在放弃之前,我们将还要查找一个 <function>do_xxx</function> 方法。这个可替换的命名模式一般用于单独的标记,如 <sgmltag>&lt;br></sgmltag>,这些标记没有相应的结束标记。但是您可以使用任何一种模式,正如您看一的,&sgmlparser; 对每个标记尝试两次。 (您不应该对相同的标记同时定义 <function>start_xxx</function> 和 <function>do_xxx</function> 处理方法,因为这样的话只有 <function>start_xxx</function> 方法会被调用。) </para>  
    675 686 </callout>  
    676 687 <callout arearefs="dialect.dialectizer.2.5">  
    677   <para>Another &attributeerror;, which means that the call to &getattr; failed with <function>do_xxx</function>.  Since you found neither a <function>start_xxx</function> nor a <function>do_xxx</function> method for this tag, you catch the exception and fall back on the default method, &unknown_starttag;.</para>  
      688 <para>另一个 &attributeerror; 异常,它是说用 <function>do_xxx</function> 来调用 &getattr; 实败了。因为对同一个标记我们既没有找到 <function>start_xxx</function> 也没有找到 <function>do_xxx</function> 处理方法,这样我们捕捉到了异常并且求助于缺省方法:  &unknown_starttag;。</para>  
    677 688 </callout>  
    678 689 <callout arearefs="dialect.dialectizer.2.6">  
    679   <para>Remember, &tryexcept; blocks can have an &else; clause, which is called if <link linkend="crossplatform.example">no exception is raised</link> during the &tryexcept; block.  Logically, that means that you <emphasis>did</emphasis> find a <function>do_xxx</function> method for this tag, so you're going to call it.</para>  
      690 <para>记得吗?&tryexcept; 块可以有一个 &else; 子句,当在 &tryexcept; 块中 <link linkend="crossplatform.example">没有异常被引发</link> 时,它将被调用。逻辑上,意味着我们 <emphasis>确实</emphasis> 找到了这个标记的 <function>do_xxx</function> 方法,所以我们将要调用它。</para>  
    679 690 </callout>  
    680 691 <callout arearefs="dialect.dialectizer.2.7">  
    681   <para>By the way, don't worry about these different return values; in theory they mean something, but they're never actually used.  Don't worry about the <literal>self.stack.append(tag)</literal> either; &sgmlparser; keeps track internally of whether your start tags are balanced by appropriate end tags, but it doesn't do anything with this information either.  In theory, you could use this module to validate that your tags were fully balanced, but it's probably not worth it, and it's beyond the scope of this chapter.  You have better things to worry about right now.</para>  
      692 <para>顺便说, 不要为这些不同的返回值而担心; 理论上他们有意义, 但实际上它们没有任何用处。也不要担心   <literal>self.stack.append(tag)</literal> ; &sgmlparser; 内部会知晓您的开始标记是否有合适的结束标记与之匹配, 但是它不会对这些信息做任何操作。理论上, 您能使用这个模块校验您的标记是否完全匹配, 但是这或许没有多大价值, 并且这样的内容已经超出了本章所要讨论的范畴。现在有您更需要担心的问题。</para>  
    681 692 </callout>  
    682 693 <callout arearefs="dialect.dialectizer.2.8">  
    683   <para><function>start_xxx</function> and <function>do_xxx</function> methods are not called directly; the tag, method, and attributes are passed to this function, <function>handle_starttag</function>, so that descendants can override it and change the way <emphasis>all</emphasis> start tags are dispatched.  You don't need that level of control, so you just let this method do its thing, which is to call the method (<function>start_xxx</function> or <function>do_xxx</function>) with the list of attributes.  Remember, <varname>method</varname> is a function, returned from &getattr;, and functions are objects.  (I know you're getting tired of hearing it, and I promise I'll stop saying it as soon as I run out of ways to use it to my advantage.)  Here, the function object is passed into this dispatch method as an argument, and this method turns around and calls the function.  At this point, you don't need to know what the function is, what it's named, or where it's defined; the only thing you need to know about the function is that it is called with one argument, <varname>attrs</varname>.</para>  
      694 <para><function>start_xxx</function> 和 <function>do_xxx</function> 方法并不被直接调用,标记名、方法和属性被传给 <function>handle_starttag</function> 这个方法,以便继承者可以覆盖它,并改变 <emphasis>全部</emphasis> 开始标记分发的方式。我们不需要控制层,所以我们只让这个方法做它自已的事,就是用属性属性的 list 来调用方法 (<function>start_xxx</function> 或 <function>do_xxx</function>) 。记住 <varname>method</varname> 是一个从 &getattr; 返回的函数,还有函数是对象。 (我知道您已经听腻了,我发誓,一旦我们停止寻找新的使用方法来为我们服务时,我就决不再提它了。) 这时,函数对象作为一个参数传入这个分发方法,这个方法反过来再调用这个函数。在这里,我们不需要知道函数是什么,叫什么名字,或是在哪时定义的;我们只需要知道用一个参数 <varname>attrs</varname> 调用它。</para>  
    683 694 </callout>  
    684 695 </calloutlist>  
    685 696 </example>  
    686   <para>Now back to our regularly scheduled program: &dialect_classname;.  When you left, you were in the process of defining specific handler methods for &pre_starttag; and &pre_endtag; tags.  There's only one thing left to do, and that is to process text blocks with the pre-defined substitutions.  For that, you need to override the &handle_data; method.</para>  
      697 <para>现在回到我们已经计划好的程序: &dialect_classname;。当我们跑题时,我们正在定义特别的处理方法来处理 &pre_starttag; 和 &pre_endtag; 标记。还有一件事没有做,那就是用我们预定义的替换处理来处理文本块。为了实现它,我们需要覆盖 &handle_data; 方法。</para>  
    686 697 <example>  
    687   <title>Overriding the &handle_data; method</title>  
      698 <title>覆盖 &handle_data; 方法</title>  
    687 698 <programlisting>  
    688 699 &dialect_datadef; <co id="dialect.dialectizer.3.1"/>  
     
    700 711 <calloutlist>  
    701 712 <callout arearefs="dialect.dialectizer.3.1">  
    702   <para>&handle_data; is called with only one argument, the text to process.</para>  
      713 <para>&handle_data; 在调用时只使用一个参数: 要处理的文本。</para>  
    702 713 </callout>  
    703 714 <callout arearefs="dialect.dialectizer.3.2">  
    704   <para>In the ancestor <link linkend="dialect.basehtml.intro">&basehtml_classname;</link>, the &handle_data; method simply appended the text to the output buffer, &selfpieces;.  Here the logic is only slightly more complicated.  If you're in the middle of a <literal>&pre_starttag;...&pre_endtag;</literal> block, <varname>self.verbatim</varname> will be some value greater than &zero;, and you want to put the text in the output buffer unaltered.  Otherwise, you will call a separate method to process the substitutions, then put the result of that into the output buffer.  In &python;, this is a one-liner, using <link linkend="apihelper.andortrick.intro">the &andor; trick</link>.</para>  
      715 <para>在祖先类 <link linkend="dialect.basehtml.intro">&basehtml_classname;</link> 中,&handle_data; 方法只是将文本追加到输出缓冲区 &selfpieces; 之后。这里的逻辑稍微有点复杂。如果我们处于 <literal>&pre_starttag;...&pre_endtag;</literal> 块的中间,<varname>self.verbatim</varname> 将是大于 &zero; 的某个值,接着我们想要将文本不作改动地传入输出缓冲区。否则,我们将调用另一个单独的方法来进行替换处理,然后将处理结果放入输出缓冲区中。在 &python; 中,这是一个一行代码,它使用了<link linkend="apihelper.andortrick.intro">&andor; 技巧</link>。</para>  
    704 715 </callout>  
    705 716 </calloutlist>  
    706 717 </example>  
    707   <para>You're close to completely understanding &dialect_classname;.  The only missing link is the nature of the text substitutions themselves.  If you know any &perl;, you know that when complex text substitutions are required, the only real solution is regular expressions.  The classes later in &dialect_filename; define a series of regular expressions that operate on the text between the &html; tags.  But you just had <link linkend="re">a whole chapter on regular expressions</link>.  You don't really want to slog through regular expressions again, do you?  God knows I don't.  I think you've learned enough for one chapter.</para>  
    708   </section>  
      718 <para>我们已经接近了对 &dialect_classname; 的全面理解。唯一缺少的一个环节是文本替换的特性。如果您知道点 &perl;,您就会知道当需要复杂的文本替换时,唯一有效的解决方法就是正则表达式。 在 &dialect_filename; 文件后面的几个类中定义了一连串的正则表达式来操作 &html; 标记中的文本。我们已经学习过了 <link linkend="re">正则表达式中的所有字符</link>。我们不必重复学习正则表达式的艰难历程了, 不是吗?上帝知道我反正不需要。我想现在这章您已经学得差不多了。</para></section>  
    709 719 <section id="dialect.alltogether">  
    710 720 <?dbhtml filename="html_processing/all_together.html"?>  
    711   <title>Putting it all together</title>  
      721 <title>全部放在一起</title>  
    711 721 <abstract>  
    712 722 <title/>  
    713   <para>It's time to put everything you've learned so far to good use.  I hope you were paying attention.</para>  
      723 <para>到了该将迄今为止我们已经学过并用得不错的东西放在一起的时候了。我希望您专心些。</para>  
    713 723 </abstract>  
    714 724 <example>  
    715   <title>The &translate; function, part 1</title>  
      725 <title>&translate; 函数, 第 1 部分</title>  
    715 725 <programlisting>  
    716 726 &dialect_translatedef; <co id="dialect.alltogether.1.1"/>  
     
    727 737 <calloutlist>  
    728 738 <callout arearefs="dialect.alltogether.1.1">  
    729   <para>The &translate; function has an <link linkend="apihelper.optional">optional argument</link> <varname>dialectName</varname>, which is a string that specifies the dialect you'll be using.  You'll see how this is used in a minute.</para>  
      739 <para>这个 &translate; 函数有一个 <link linkend="apihelper.optional">可选参数</link>  <varname>dialectName</varname>,它是一个字符串,指出我们将使用的方言。一会我们就会看到它是如何使用的。</para>  
    729 739 </callout>  
    730 740 <callout arearefs="dialect.alltogether.1.2">  
    731   <para>Hey, wait a minute, there's an <link linkend="odbchelper.import">&import;</link> statement in this function!  That's perfectly legal in &python;.  You're used to seeing &import; statements at the top of a program, which means that the imported module is available anywhere in the program.  But you can also import modules within a function, which means that the imported module is only available within the function.  If you have a module that is only ever used in one function, this is an easy way to make your code more modular.  (When you find that your weekend hack has turned into an 800-line work of art and decide to split it up into a dozen reusable modules, you'll appreciate this.)</para>  
      741 <para>嘿,等一下,在这个函数中有一个 <link linkend="odbchelper.import">&import;</link> 语句!它在 &python; 中完全合法。您已经习惯了在一个程序的前面看到 &import; 语句,它意味着导入的模块在程序的任何地方都是可用的。但您也可以在一个函数中导入模块,这意味着导入的模块只能在函数中使用。如果您有一个只能用在一个函数中的模块,这是一个简便的方法,使您的代码更模块化。 (当发现您周末的加班已经变成了一个 800行 的艺术作品,并且决定将其分割成一打可重用的模块时,您会感谢它的。) </para>  
    731 741 </callout>  
    732 742 <callout arearefs="dialect.alltogether.1.3">  
    733   <para>Now you <link linkend="dialect.extract.urllib">get the source of the given URL</link>.</para>  
      743 <para>现在我们<link linkend="dialect.extract.urllib">得到了给定的URL的原始资料</link>。</para>  
    733 743 </callout>  
    734 744 </calloutlist>  
    735 745 </example>  
    736 746 <example>  
    737   <title>The &translate; function, part 2: curiouser and curiouser</title>  
      747 <title>&translate; 函数, 第 2 部分: 奇妙而又奇妙</title>  
    737 747 <programlisting>  
    738 748 &dialect_translateparsername; <co id="dialect.alltogether.2.1"/>  
     
    746 756 <calloutlist>  
    747 757 <callout arearefs="dialect.alltogether.2.1">  
    748   <para>&capitalize; is a string method you haven't seen before; it simply capitalizes the first letter of a string and forces everything else to lowercase.  Combined with some <link linkend="odbchelper.stringformatting">string formatting</link>, you've taken the name of a dialect and transformed it into the name of the corresponding Dialectizer class.  If <varname>dialectName</varname> is the string <literal>'chef'</literal>, <varname>parserName</varname> will be the string <literal>'ChefDialectizer'</literal>.</para>  
      758 <para>&capitalize; 是一个我们以前未曾见过的字符串方法;它只是将一个字符串的第一个字母变成大写,将其它的字母强制变成小写。与某个 <link linkend="odbchelper.stringformatting">字符串格式化</link>合在一起使用后,我们就得到了一种方言的名字,接着将它转化为相应的方言变换器类的名字。如果 <varname>dialectName</varname> 是字符串 <literal>'chef'</literal>,<varname>parserName</varname> 将是字符串 <literal>'ChefDialectizer'</literal>。</para>  
    748 758 </callout>  
    749 759 <callout arearefs="dialect.alltogether.2.2">  
    750   <para>You have the name of a class as a string (<varname>parserName</varname>), and you have the global namespace as a dictionary (&globals;()).  Combined, you can get a reference to the class which the string names.  (Remember, <link linkend="fileinfo.classattributes">classes are objects</link>, and they can be assigned to variables just like any other object.)  If <varname>parserName</varname> is the string <literal>'ChefDialectizer'</literal>, <varname>parserClass</varname> will be the class <literal>ChefDialectizer</literal>.</para>  
      760 <para>我们有了一个字符串形式 (<varname>parserName</varname>) 的类名称,还有一个 dictionary (&globals;()) 形式的全局名字空间。合起来后,我们可以得到一个以前面字符串命名的类的引用。 (回想一下,<link linkend="fileinfo.classattributes">类是对象</link>,并且它们可以象其它对象一样赋值给一个变量。) 如果 <varname>parserName</varname> 是字符串 <literal>'ChefDialectizer'</literal>,<varname>parserClass</varname> 将是类 <literal>ChefDialectizer</literal>。</para>  
    750 760 </callout>  
    751 761 <callout arearefs="dialect.alltogether.2.3">  
    752   <para>Finally, you have a class object (<varname>parserClass</varname>), and you want an instance of the class.  Well, you already know how to do that: <link linkend="fileinfo.create">call the class like a function</link>.  The fact that the class is being stored in a local variable makes absolutely no difference; you just call the local variable like a function, and out pops an instance of the class.  If <varname>parserClass</varname> is the class <literal>ChefDialectizer</literal>, <varname>parser</varname> will be an instance of the class <literal>ChefDialectizer</literal>.</para>  
      762 <para>最后,我们拥有了一个类对象 (<varname>parserClass</varname>),接着我们想要生成这个类的一个实例。好,我们已经知道如何去做了: <link linkend="fileinfo.create">象函数一样调用类</link>。这个类保存在一个局部变量中的事实完全不会有什么影响;我们只是象函数一样调用这个局部变量,取出这个类的一个实例。如果 <varname>parserClass</varname> 是类 <literal>ChefDialectizer</literal>,<varname>parser</varname> 将是类 <literal>ChefDialectizer</literal> 的一个实例。</para>  
    752 762 </callout>  
    753 763 </calloutlist>  
    754 764 </example>  
    755   <para>Why bother?  After all, there are only 3 <classname>Dialectizer</classname> classes; why not just use a <function>case</function> statement?  (Well, there's no <function>case</function> statement in &python;, but why not just use a series of &if; statements?)  One reason: extensibility.  The &translate; function has absolutely no idea how many Dialectizer classes you've defined.  Imagine if you defined a new <classname>FooDialectizer</classname> tomorrow; &translate; would work by passing <literal>'foo'</literal> as the <varname>dialectName</varname>.</para>  
    756   <para>Even better, imagine putting <classname>FooDialectizer</classname> in a separate module, and importing it with &frommoduleimport;.  You've already seen that this <link linkend="dialect.globals.example">includes it in &globals;()</link>, so &translate; would still work without modification, even though <classname>FooDialectizer</classname> was in a separate file.</para>  
    757   <para>Now imagine that the name of the dialect is coming from somewhere outside the program, maybe from a database or from a user-inputted value on a form.  You can use any number of server-side &python; scripting architectures to dynamically generate web pages; this function could take a &url; and a dialect name (both strings) in the query string of a web page request, and output the <quote>translated</quote> web page.</para>  
    758   <para>Finally, imagine a <classname>Dialectizer</classname> framework with a plug-in architecture.  You could put each <classname>Dialectizer</classname> class in a separate file, leaving only the &translate; function in &dialect_filename;.  Assuming a consistent naming scheme, the &translate; function could dynamic import the appropiate class from the appropriate file, given nothing but the dialect name.  (You haven't seen dynamic importing yet, but I promise to cover it in a later chapter.)  To add a new dialect, you would simply add an appropriately-named file in the plug-ins directory (like <filename>foodialect.py</filename> which contains the <classname>FooDialectizer</classname> class).  Calling the &translate; function with the dialect name <literal>'foo'</literal> would find the module <filename>foodialect.py</filename>, import the class <classname>FooDialectizer</classname>, and away you go.</para>  
      765 <para>怎么这么麻烦?毕竟只有三个 <classname>Dialectizer</classname> 类;为什么不只使用一个 <function>case</function> 语句? (噢,在 &python; 中不存在 <function>case</function> 语句,但为什么不只使用一组 &if; 语句呢?) 理由之一是: 可扩展性。这个 &translate; 函数完全不用关心我们定义了多少个方言变换器类。设想一下,如果我们明天定义了一个新的 <classname>FooDialectizer</classname> 类,把 <literal>'foo'</literal> 作为 <varname>dialectName</varname> 传给 &translate; , &translate; 也能工作。</para>  
      766 <para>甚至会更好,设想将 <classname>FooDialectizer</classname> 放进一个独立的模块中,使用 &frommoduleimport; 将其导入。我们已经知道了,这样会将它 <link linkend="dialect.globals.example">包含在 globals()</link> 中 ,所以不用修改 &translate; ,它仍然可以正确运行,尽管 <classname>FooDialectizer</classname> 位于一个独立的文件中。</para>  
      767 <para>现在设想一下方言的名字是从程序外面的某个地方来的,也许是从一个数据库中,或从一个表格中的用户输入的值中。您可以使用任意多的服务端 &python; 脚本架构来动态地生成网页;这个函数将接收在页面请求的查询字符串中的一个 &url; 和一个方言名字 (两个都是字符串) ,接着输出 <quote>翻译</quote> 后的网页。 </para>  
      768 <para>最后,设想一下,使用了一种插件架构的 <classname>Dialectizer</classname> 框架。您可以将每个 <classname>Dialectizer</classname> 类放在分别放在独立的文件中,在 &dialect_filename; 中只留下 &translate; 函数。假定一种统一的命名模式,这个 &translate; 函数能够动态地从合适的文件中导入合适的类,除了方言名字外什么都不用给出。 (虽然您还没有看过动态导入,但我保证在后面的一章中会涉及到它。) 如果要加入一种新的方言,您只要在插件目录下加入一个以合适的名字命名的文件 (象 <filename>foodialect.py</filename>,它包含了 <classname>FooDialectizer</classname> 类) 。使用方言名 <literal>'foo'</literal> 来调用这个 &translate; 函数,将会查找 <filename>foodialect.py</filename> 模块,导入 <classname>FooDialectizer</classname> 类,这样就行了。</para>  
    759 769 <example>  
    760   <title>The &translate; function, part 3</title>  
      770 <title>&translate; 函数, 第 3 部分</title>  
    760 770 <programlisting>  
    761 771 &dialect_translatefeed; <co id="dialect.alltogether.3.1"/>  
    769 779 <calloutlist>  
    770 780 <callout arearefs="dialect.alltogether.3.1">  
    771   <para>After all that imagining, this is going to seem pretty boring, but the <function>feed</function> function is what <link linkend="dialect.feed.example">does the entire transformation</link>.  You had the entire &html; source in a single string, so you only had to call <function>feed</function> once.  However, you can call <function>feed</function> as often as you want, and the parser will just keep parsing.  So if you were worried about memory usage (or you knew you were going to be dealing with very large &html; pages), you could set this up in a loop, where you read a few bytes of &html; and fed it to the parser.  The result would be the same.</para>  
      781 <para>毕竟那只是假设,这个似乎会非常令人讨厌,但这个 <function>feed</function> 函数执行了全部的转换工作。我们拥有存在于单个字符串中的全部 &html; 源代码,所以我们只需要调用 <function>feed</function> 一次。然而,您可以按您的需要经常调用 <function>feed</function>,分析器将不停地进行分析。所以如果我们担心内存的使用 (或者我们已经知道了将要处理非常巨大的 &html; 页面) ,我们可以在一个循环中调用它,即我们读出一点 &html; 字节,就将其送进分析器。结果会是一样的。</para>  
    771 781 </callout>  
    772 782 <callout arearefs="dialect.alltogether.3.2">  
    773   <para>Because <function>feed</function> maintains an internal buffer, you should always call the parser's &close; method when you're done (even if you fed it all at once, like you did).  Otherwise you may find that your output is missing the last few bytes.</para>  
      783 <para>因为 <function>feed</function> 维护着一个内部缓冲区,当您完成时,应该总是调用分析器的 &close; 方法 (那怕您象我们做的一样,一次就全部送出) 。否则您可能会发现,输出丢掉了最后几个字节。</para>  
    773 783 </callout>  
    774 784 <callout arearefs="dialect.alltogether.3.3">  
    775   <para>Remember, <function>output</function> is the function you defined on &basehtml_classname; that <link linkend="dialect.output.example">joins all the pieces of output you've buffered</link> and returns them in a single string.</para>  
      785 <para>回想一下,<function>output</function> 是我们在 &basehtml_classname; 上定义的函数,用来 <link linkend="dialect.output.example">将所有缓冲的输出片段连接起来</link> 并且以单个字符串返回。</para>  
    775 785 </callout>  
    776 786 </calloutlist>  
    777 787 </example>  
    778   <para>And just like that, you've <quote>translated</quote> a web page, given nothing but a &url; and the name of a dialect.</para>  
      788 <para>象这样,我们已经 <quote>翻译</quote> 了一个网页,除了给出一个 &url; 和一种方言的名字外,什么都没有给出。</para>  
    778 788 <itemizedlist role="furtherreading">  
    779   <title>Further reading</title>  
    780   <listitem><para>You thought I was kidding about the server-side scripting idea.  So did I, until I found <ulink url="http://rinkworks.com/dialect/">this web-based dialectizer</ulink>.  Unfortunately, source code does not appear to be available.</para></listitem>  
      789 <title>进一步阅读</title>  
      790 <listitem><para>您可能会认为我正在拿服务端脚本编程开玩笑。在我发现<ulink url="http://rinkworks.com/dialect/">这个基于 web 的方言转换器</ulink>之前,的确是这样认为的。  不幸的是,看不到它的源代码。</para></listitem>  
    781 791 </itemizedlist>  
    782 792 </section>  
    783 793 <section id="dialect.summary">  
    784 794 <?dbhtml filename="html_processing/summary.html"?>  
    785   <title>Summary</title>  
      795 <title>小结</title>  
    785 795 <abstract>  
    786 796 <title/>  
    787   <para>&python; provides you with a powerful tool, &sgmllib_filename;, to manipulate &html; by turning its structure into an object model.  You can use this tool in many different ways.</para>  
      797 <para>&python; 向您提供了一个强大工具,&sgmllib_filename;,可以通过将 &html; 结构转变为一种对象模型来进行处理。可以以许多不同的方式来使用这个工具。</para>  
    787 797 </abstract>  
    788 798 <itemizedlist>  
    789   <listitem><para>parsing the &html; looking for something specific</para></listitem>  
    790   <listitem><para>aggregating the results, like the <link linkend="dialect.extract.links">&url; lister</link></para></listitem>  
    791   <listitem><para>altering the structure along the way, like the <link linkend="dialect.quoting.example">attribute quoter</link></para></listitem>  
    792   <listitem><para>transforming the &html; into something else by manipulating the text while leaving the tags alone, like the <link linkend="dialect.dialectizer"><classname>Dialectizer</classname></link></para></listitem>  
      799 <listitem><para>对 &html; 进行分析,搜索特别的东西</para></listitem>  
      800 <listitem><para>汇集结果,如 <link linkend="dialect.extract.links">&url; lister</link></para></listitem>  
      801 <listitem><para>按结构的方式对其进行修改,如 <link linkend="dialect.quoting.example">属性引用</link></para></listitem>  
      802 <listitem><para>将 &html; 转换为其它的东西,通过对文本进行处理,同时保留标记,如 <link linkend="dialect.dialectizer"><classname>Dialectizer</classname></link></para></listitem>  
    793 803 </itemizedlist>  
    794   <para>Along with these examples, you should be comfortable doing all of the following things:</para>  
      804 <para>学过了这些例子之后,您应该无障碍地完成下面的事情:</para>  
    794 804 <itemizedlist>  
    795   <listitem><para>Using <link linkend="dialect.locals">&locals;() and &globals;()</link> to access namespaces</para></listitem>  
    796   <listitem><para><link linkend="dialect.dictsub">Formatting strings</link> using dictionary-based substitutions</para></listitem>  
      805 <listitem><para>使用 <link linkend="dialect.locals">&locals;() 和 &globals;()</link> 来访问名字空间</para></listitem>  
      806 <listitem><para>使用基于 dictionary 替换的 <link linkend="dialect.dictsub">字符串格式化</link></para></listitem>  
    797 807 </itemizedlist>  
    798 808 </section>