Mercurial > vim-lawrencium
annotate resources/BeautifulSoup.py @ 126:47209552ec46
Shellescaped all command arguments in HgRepo.GetCommand,
so that the commands work properly with ugly file names,
in my case containing parentheses.
Wrapping revision arguments in quotes is no longer necessary,
so removed all of that as well.
author | namark <nshan.nnnn@gmail.com> |
---|---|
date | Wed, 02 Dec 2015 22:45:12 +0400 |
parents | f02e37f395ae |
children |
rev | line source |
---|---|
15
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1 """Beautiful Soup |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
2 Elixir and Tonic |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
3 "The Screen-Scraper's Friend" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
4 http://www.crummy.com/software/BeautifulSoup/ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
5 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
6 Beautiful Soup parses a (possibly invalid) XML or HTML document into a |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
7 tree representation. It provides methods and Pythonic idioms that make |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
8 it easy to navigate, search, and modify the tree. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
9 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
10 A well-formed XML/HTML document yields a well-formed data |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
11 structure. An ill-formed XML/HTML document yields a correspondingly |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
12 ill-formed data structure. If your document is only locally |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
13 well-formed, you can use this library to find and process the |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
14 well-formed part of it. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
15 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
16 Beautiful Soup works with Python 2.2 and up. It has no external |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
17 dependencies, but you'll have more success at converting data to UTF-8 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
18 if you also install these three packages: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
19 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
20 * chardet, for auto-detecting character encodings |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
21 http://chardet.feedparser.org/ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
22 * cjkcodecs and iconv_codec, which add more encodings to the ones supported |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
23 by stock Python. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
24 http://cjkpython.i18n.org/ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
25 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
26 Beautiful Soup defines classes for two main parsing strategies: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
27 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
28 * BeautifulStoneSoup, for parsing XML, SGML, or your domain-specific |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
29 language that kind of looks like XML. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
30 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
31 * BeautifulSoup, for parsing run-of-the-mill HTML code, be it valid |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
32 or invalid. This class has web browser-like heuristics for |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
33 obtaining a sensible parse tree in the face of common HTML errors. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
34 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
35 Beautiful Soup also defines a class (UnicodeDammit) for autodetecting |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
36 the encoding of an HTML or XML document, and converting it to |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
37 Unicode. Much of this code is taken from Mark Pilgrim's Universal Feed Parser. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
38 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
39 For more than you ever wanted to know about Beautiful Soup, see the |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
40 documentation: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
41 http://www.crummy.com/software/BeautifulSoup/documentation.html |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
42 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
43 Here, have some legalese: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
44 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
45 Copyright (c) 2004-2010, Leonard Richardson |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
46 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
47 All rights reserved. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
48 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
49 Redistribution and use in source and binary forms, with or without |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
50 modification, are permitted provided that the following conditions are |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
51 met: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
52 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
53 * Redistributions of source code must retain the above copyright |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
54 notice, this list of conditions and the following disclaimer. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
55 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
56 * Redistributions in binary form must reproduce the above |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
57 copyright notice, this list of conditions and the following |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
58 disclaimer in the documentation and/or other materials provided |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
59 with the distribution. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
60 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
61 * Neither the name of the the Beautiful Soup Consortium and All |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
62 Night Kosher Bakery nor the names of its contributors may be |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
63 used to endorse or promote products derived from this software |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
64 without specific prior written permission. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
65 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
66 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
67 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
68 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
69 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
70 CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
71 EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
72 PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
73 PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
74 LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
75 NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
76 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE, DAMMIT. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
77 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
78 """ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
79 from __future__ import generators |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
80 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
81 __author__ = "Leonard Richardson (leonardr@segfault.org)" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
82 __version__ = "3.2.0" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
83 __copyright__ = "Copyright (c) 2004-2010 Leonard Richardson" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
84 __license__ = "New-style BSD" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
85 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
86 from sgmllib import SGMLParser, SGMLParseError |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
87 import codecs |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
88 import markupbase |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
89 import types |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
90 import re |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
91 import sgmllib |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
92 try: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
93 from htmlentitydefs import name2codepoint |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
94 except ImportError: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
95 name2codepoint = {} |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
96 try: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
97 set |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
98 except NameError: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
99 from sets import Set as set |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
100 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
101 #These hacks make Beautiful Soup able to parse XML with namespaces |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
102 sgmllib.tagfind = re.compile('[a-zA-Z][-_.:a-zA-Z0-9]*') |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
103 markupbase._declname_match = re.compile(r'[a-zA-Z][-_.:a-zA-Z0-9]*\s*').match |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
104 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
105 DEFAULT_OUTPUT_ENCODING = "utf-8" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
106 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
107 def _match_css_class(str): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
108 """Build a RE to match the given CSS class.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
109 return re.compile(r"(^|.*\s)%s($|\s)" % str) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
110 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
111 # First, the classes that represent markup elements. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
112 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
113 class PageElement(object): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
114 """Contains the navigational information for some part of the page |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
115 (either a tag or a piece of text)""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
116 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
117 def setup(self, parent=None, previous=None): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
118 """Sets up the initial relations between this element and |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
119 other elements.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
120 self.parent = parent |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
121 self.previous = previous |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
122 self.next = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
123 self.previousSibling = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
124 self.nextSibling = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
125 if self.parent and self.parent.contents: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
126 self.previousSibling = self.parent.contents[-1] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
127 self.previousSibling.nextSibling = self |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
128 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
129 def replaceWith(self, replaceWith): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
130 oldParent = self.parent |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
131 myIndex = self.parent.index(self) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
132 if hasattr(replaceWith, "parent")\ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
133 and replaceWith.parent is self.parent: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
134 # We're replacing this element with one of its siblings. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
135 index = replaceWith.parent.index(replaceWith) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
136 if index and index < myIndex: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
137 # Furthermore, it comes before this element. That |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
138 # means that when we extract it, the index of this |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
139 # element will change. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
140 myIndex = myIndex - 1 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
141 self.extract() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
142 oldParent.insert(myIndex, replaceWith) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
143 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
144 def replaceWithChildren(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
145 myParent = self.parent |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
146 myIndex = self.parent.index(self) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
147 self.extract() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
148 reversedChildren = list(self.contents) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
149 reversedChildren.reverse() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
150 for child in reversedChildren: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
151 myParent.insert(myIndex, child) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
152 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
153 def extract(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
154 """Destructively rips this element out of the tree.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
155 if self.parent: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
156 try: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
157 del self.parent.contents[self.parent.index(self)] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
158 except ValueError: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
159 pass |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
160 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
161 #Find the two elements that would be next to each other if |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
162 #this element (and any children) hadn't been parsed. Connect |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
163 #the two. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
164 lastChild = self._lastRecursiveChild() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
165 nextElement = lastChild.next |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
166 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
167 if self.previous: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
168 self.previous.next = nextElement |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
169 if nextElement: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
170 nextElement.previous = self.previous |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
171 self.previous = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
172 lastChild.next = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
173 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
174 self.parent = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
175 if self.previousSibling: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
176 self.previousSibling.nextSibling = self.nextSibling |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
177 if self.nextSibling: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
178 self.nextSibling.previousSibling = self.previousSibling |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
179 self.previousSibling = self.nextSibling = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
180 return self |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
181 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
182 def _lastRecursiveChild(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
183 "Finds the last element beneath this object to be parsed." |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
184 lastChild = self |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
185 while hasattr(lastChild, 'contents') and lastChild.contents: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
186 lastChild = lastChild.contents[-1] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
187 return lastChild |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
188 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
189 def insert(self, position, newChild): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
190 if isinstance(newChild, basestring) \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
191 and not isinstance(newChild, NavigableString): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
192 newChild = NavigableString(newChild) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
193 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
194 position = min(position, len(self.contents)) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
195 if hasattr(newChild, 'parent') and newChild.parent is not None: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
196 # We're 'inserting' an element that's already one |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
197 # of this object's children. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
198 if newChild.parent is self: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
199 index = self.index(newChild) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
200 if index > position: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
201 # Furthermore we're moving it further down the |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
202 # list of this object's children. That means that |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
203 # when we extract this element, our target index |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
204 # will jump down one. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
205 position = position - 1 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
206 newChild.extract() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
207 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
208 newChild.parent = self |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
209 previousChild = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
210 if position == 0: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
211 newChild.previousSibling = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
212 newChild.previous = self |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
213 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
214 previousChild = self.contents[position-1] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
215 newChild.previousSibling = previousChild |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
216 newChild.previousSibling.nextSibling = newChild |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
217 newChild.previous = previousChild._lastRecursiveChild() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
218 if newChild.previous: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
219 newChild.previous.next = newChild |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
220 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
221 newChildsLastElement = newChild._lastRecursiveChild() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
222 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
223 if position >= len(self.contents): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
224 newChild.nextSibling = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
225 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
226 parent = self |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
227 parentsNextSibling = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
228 while not parentsNextSibling: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
229 parentsNextSibling = parent.nextSibling |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
230 parent = parent.parent |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
231 if not parent: # This is the last element in the document. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
232 break |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
233 if parentsNextSibling: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
234 newChildsLastElement.next = parentsNextSibling |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
235 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
236 newChildsLastElement.next = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
237 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
238 nextChild = self.contents[position] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
239 newChild.nextSibling = nextChild |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
240 if newChild.nextSibling: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
241 newChild.nextSibling.previousSibling = newChild |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
242 newChildsLastElement.next = nextChild |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
243 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
244 if newChildsLastElement.next: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
245 newChildsLastElement.next.previous = newChildsLastElement |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
246 self.contents.insert(position, newChild) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
247 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
248 def append(self, tag): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
249 """Appends the given tag to the contents of this tag.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
250 self.insert(len(self.contents), tag) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
251 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
252 def findNext(self, name=None, attrs={}, text=None, **kwargs): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
253 """Returns the first item that matches the given criteria and |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
254 appears after this Tag in the document.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
255 return self._findOne(self.findAllNext, name, attrs, text, **kwargs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
256 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
257 def findAllNext(self, name=None, attrs={}, text=None, limit=None, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
258 **kwargs): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
259 """Returns all items that match the given criteria and appear |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
260 after this Tag in the document.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
261 return self._findAll(name, attrs, text, limit, self.nextGenerator, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
262 **kwargs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
263 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
264 def findNextSibling(self, name=None, attrs={}, text=None, **kwargs): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
265 """Returns the closest sibling to this Tag that matches the |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
266 given criteria and appears after this Tag in the document.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
267 return self._findOne(self.findNextSiblings, name, attrs, text, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
268 **kwargs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
269 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
270 def findNextSiblings(self, name=None, attrs={}, text=None, limit=None, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
271 **kwargs): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
272 """Returns the siblings of this Tag that match the given |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
273 criteria and appear after this Tag in the document.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
274 return self._findAll(name, attrs, text, limit, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
275 self.nextSiblingGenerator, **kwargs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
276 fetchNextSiblings = findNextSiblings # Compatibility with pre-3.x |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
277 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
278 def findPrevious(self, name=None, attrs={}, text=None, **kwargs): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
279 """Returns the first item that matches the given criteria and |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
280 appears before this Tag in the document.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
281 return self._findOne(self.findAllPrevious, name, attrs, text, **kwargs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
282 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
283 def findAllPrevious(self, name=None, attrs={}, text=None, limit=None, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
284 **kwargs): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
285 """Returns all items that match the given criteria and appear |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
286 before this Tag in the document.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
287 return self._findAll(name, attrs, text, limit, self.previousGenerator, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
288 **kwargs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
289 fetchPrevious = findAllPrevious # Compatibility with pre-3.x |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
290 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
291 def findPreviousSibling(self, name=None, attrs={}, text=None, **kwargs): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
292 """Returns the closest sibling to this Tag that matches the |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
293 given criteria and appears before this Tag in the document.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
294 return self._findOne(self.findPreviousSiblings, name, attrs, text, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
295 **kwargs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
296 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
297 def findPreviousSiblings(self, name=None, attrs={}, text=None, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
298 limit=None, **kwargs): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
299 """Returns the siblings of this Tag that match the given |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
300 criteria and appear before this Tag in the document.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
301 return self._findAll(name, attrs, text, limit, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
302 self.previousSiblingGenerator, **kwargs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
303 fetchPreviousSiblings = findPreviousSiblings # Compatibility with pre-3.x |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
304 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
305 def findParent(self, name=None, attrs={}, **kwargs): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
306 """Returns the closest parent of this Tag that matches the given |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
307 criteria.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
308 # NOTE: We can't use _findOne because findParents takes a different |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
309 # set of arguments. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
310 r = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
311 l = self.findParents(name, attrs, 1) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
312 if l: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
313 r = l[0] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
314 return r |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
315 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
316 def findParents(self, name=None, attrs={}, limit=None, **kwargs): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
317 """Returns the parents of this Tag that match the given |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
318 criteria.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
319 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
320 return self._findAll(name, attrs, None, limit, self.parentGenerator, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
321 **kwargs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
322 fetchParents = findParents # Compatibility with pre-3.x |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
323 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
324 #These methods do the real heavy lifting. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
325 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
326 def _findOne(self, method, name, attrs, text, **kwargs): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
327 r = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
328 l = method(name, attrs, text, 1, **kwargs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
329 if l: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
330 r = l[0] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
331 return r |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
332 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
333 def _findAll(self, name, attrs, text, limit, generator, **kwargs): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
334 "Iterates over a generator looking for things that match." |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
335 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
336 if isinstance(name, SoupStrainer): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
337 strainer = name |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
338 # (Possibly) special case some findAll*(...) searches |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
339 elif text is None and not limit and not attrs and not kwargs: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
340 # findAll*(True) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
341 if name is True: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
342 return [element for element in generator() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
343 if isinstance(element, Tag)] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
344 # findAll*('tag-name') |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
345 elif isinstance(name, basestring): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
346 return [element for element in generator() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
347 if isinstance(element, Tag) and |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
348 element.name == name] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
349 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
350 strainer = SoupStrainer(name, attrs, text, **kwargs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
351 # Build a SoupStrainer |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
352 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
353 strainer = SoupStrainer(name, attrs, text, **kwargs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
354 results = ResultSet(strainer) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
355 g = generator() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
356 while True: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
357 try: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
358 i = g.next() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
359 except StopIteration: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
360 break |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
361 if i: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
362 found = strainer.search(i) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
363 if found: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
364 results.append(found) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
365 if limit and len(results) >= limit: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
366 break |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
367 return results |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
368 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
369 #These Generators can be used to navigate starting from both |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
370 #NavigableStrings and Tags. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
371 def nextGenerator(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
372 i = self |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
373 while i is not None: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
374 i = i.next |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
375 yield i |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
376 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
377 def nextSiblingGenerator(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
378 i = self |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
379 while i is not None: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
380 i = i.nextSibling |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
381 yield i |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
382 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
383 def previousGenerator(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
384 i = self |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
385 while i is not None: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
386 i = i.previous |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
387 yield i |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
388 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
389 def previousSiblingGenerator(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
390 i = self |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
391 while i is not None: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
392 i = i.previousSibling |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
393 yield i |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
394 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
395 def parentGenerator(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
396 i = self |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
397 while i is not None: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
398 i = i.parent |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
399 yield i |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
400 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
401 # Utility methods |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
402 def substituteEncoding(self, str, encoding=None): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
403 encoding = encoding or "utf-8" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
404 return str.replace("%SOUP-ENCODING%", encoding) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
405 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
406 def toEncoding(self, s, encoding=None): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
407 """Encodes an object to a string in some encoding, or to Unicode. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
408 .""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
409 if isinstance(s, unicode): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
410 if encoding: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
411 s = s.encode(encoding) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
412 elif isinstance(s, str): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
413 if encoding: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
414 s = s.encode(encoding) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
415 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
416 s = unicode(s) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
417 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
418 if encoding: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
419 s = self.toEncoding(str(s), encoding) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
420 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
421 s = unicode(s) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
422 return s |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
423 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
424 class NavigableString(unicode, PageElement): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
425 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
426 def __new__(cls, value): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
427 """Create a new NavigableString. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
428 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
429 When unpickling a NavigableString, this method is called with |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
430 the string in DEFAULT_OUTPUT_ENCODING. That encoding needs to be |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
431 passed in to the superclass's __new__ or the superclass won't know |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
432 how to handle non-ASCII characters. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
433 """ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
434 if isinstance(value, unicode): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
435 return unicode.__new__(cls, value) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
436 return unicode.__new__(cls, value, DEFAULT_OUTPUT_ENCODING) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
437 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
438 def __getnewargs__(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
439 return (NavigableString.__str__(self),) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
440 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
441 def __getattr__(self, attr): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
442 """text.string gives you text. This is for backwards |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
443 compatibility for Navigable*String, but for CData* it lets you |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
444 get the string without the CData wrapper.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
445 if attr == 'string': |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
446 return self |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
447 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
448 raise AttributeError, "'%s' object has no attribute '%s'" % (self.__class__.__name__, attr) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
449 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
450 def __unicode__(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
451 return str(self).decode(DEFAULT_OUTPUT_ENCODING) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
452 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
453 def __str__(self, encoding=DEFAULT_OUTPUT_ENCODING): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
454 if encoding: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
455 return self.encode(encoding) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
456 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
457 return self |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
458 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
459 class CData(NavigableString): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
460 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
461 def __str__(self, encoding=DEFAULT_OUTPUT_ENCODING): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
462 return "<![CDATA[%s]]>" % NavigableString.__str__(self, encoding) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
463 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
464 class ProcessingInstruction(NavigableString): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
465 def __str__(self, encoding=DEFAULT_OUTPUT_ENCODING): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
466 output = self |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
467 if "%SOUP-ENCODING%" in output: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
468 output = self.substituteEncoding(output, encoding) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
469 return "<?%s?>" % self.toEncoding(output, encoding) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
470 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
471 class Comment(NavigableString): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
472 def __str__(self, encoding=DEFAULT_OUTPUT_ENCODING): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
473 return "<!--%s-->" % NavigableString.__str__(self, encoding) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
474 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
475 class Declaration(NavigableString): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
476 def __str__(self, encoding=DEFAULT_OUTPUT_ENCODING): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
477 return "<!%s>" % NavigableString.__str__(self, encoding) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
478 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
479 class Tag(PageElement): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
480 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
481 """Represents a found HTML tag with its attributes and contents.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
482 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
483 def _invert(h): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
484 "Cheap function to invert a hash." |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
485 i = {} |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
486 for k,v in h.items(): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
487 i[v] = k |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
488 return i |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
489 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
490 XML_ENTITIES_TO_SPECIAL_CHARS = { "apos" : "'", |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
491 "quot" : '"', |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
492 "amp" : "&", |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
493 "lt" : "<", |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
494 "gt" : ">" } |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
495 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
496 XML_SPECIAL_CHARS_TO_ENTITIES = _invert(XML_ENTITIES_TO_SPECIAL_CHARS) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
497 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
498 def _convertEntities(self, match): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
499 """Used in a call to re.sub to replace HTML, XML, and numeric |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
500 entities with the appropriate Unicode characters. If HTML |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
501 entities are being converted, any unrecognized entities are |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
502 escaped.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
503 x = match.group(1) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
504 if self.convertHTMLEntities and x in name2codepoint: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
505 return unichr(name2codepoint[x]) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
506 elif x in self.XML_ENTITIES_TO_SPECIAL_CHARS: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
507 if self.convertXMLEntities: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
508 return self.XML_ENTITIES_TO_SPECIAL_CHARS[x] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
509 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
510 return u'&%s;' % x |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
511 elif len(x) > 0 and x[0] == '#': |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
512 # Handle numeric entities |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
513 if len(x) > 1 and x[1] == 'x': |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
514 return unichr(int(x[2:], 16)) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
515 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
516 return unichr(int(x[1:])) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
517 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
518 elif self.escapeUnrecognizedEntities: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
519 return u'&%s;' % x |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
520 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
521 return u'&%s;' % x |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
522 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
523 def __init__(self, parser, name, attrs=None, parent=None, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
524 previous=None): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
525 "Basic constructor." |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
526 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
527 # We don't actually store the parser object: that lets extracted |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
528 # chunks be garbage-collected |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
529 self.parserClass = parser.__class__ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
530 self.isSelfClosing = parser.isSelfClosingTag(name) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
531 self.name = name |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
532 if attrs is None: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
533 attrs = [] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
534 elif isinstance(attrs, dict): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
535 attrs = attrs.items() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
536 self.attrs = attrs |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
537 self.contents = [] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
538 self.setup(parent, previous) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
539 self.hidden = False |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
540 self.containsSubstitutions = False |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
541 self.convertHTMLEntities = parser.convertHTMLEntities |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
542 self.convertXMLEntities = parser.convertXMLEntities |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
543 self.escapeUnrecognizedEntities = parser.escapeUnrecognizedEntities |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
544 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
545 # Convert any HTML, XML, or numeric entities in the attribute values. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
546 convert = lambda(k, val): (k, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
547 re.sub("&(#\d+|#x[0-9a-fA-F]+|\w+);", |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
548 self._convertEntities, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
549 val)) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
550 self.attrs = map(convert, self.attrs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
551 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
552 def getString(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
553 if (len(self.contents) == 1 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
554 and isinstance(self.contents[0], NavigableString)): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
555 return self.contents[0] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
556 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
557 def setString(self, string): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
558 """Replace the contents of the tag with a string""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
559 self.clear() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
560 self.append(string) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
561 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
562 string = property(getString, setString) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
563 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
564 def getText(self, separator=u""): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
565 if not len(self.contents): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
566 return u"" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
567 stopNode = self._lastRecursiveChild().next |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
568 strings = [] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
569 current = self.contents[0] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
570 while current is not stopNode: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
571 if isinstance(current, NavigableString): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
572 strings.append(current.strip()) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
573 current = current.next |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
574 return separator.join(strings) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
575 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
576 text = property(getText) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
577 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
578 def get(self, key, default=None): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
579 """Returns the value of the 'key' attribute for the tag, or |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
580 the value given for 'default' if it doesn't have that |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
581 attribute.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
582 return self._getAttrMap().get(key, default) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
583 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
584 def clear(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
585 """Extract all children.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
586 for child in self.contents[:]: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
587 child.extract() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
588 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
589 def index(self, element): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
590 for i, child in enumerate(self.contents): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
591 if child is element: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
592 return i |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
593 raise ValueError("Tag.index: element not in tag") |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
594 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
595 def has_key(self, key): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
596 return self._getAttrMap().has_key(key) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
597 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
598 def __getitem__(self, key): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
599 """tag[key] returns the value of the 'key' attribute for the tag, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
600 and throws an exception if it's not there.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
601 return self._getAttrMap()[key] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
602 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
603 def __iter__(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
604 "Iterating over a tag iterates over its contents." |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
605 return iter(self.contents) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
606 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
607 def __len__(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
608 "The length of a tag is the length of its list of contents." |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
609 return len(self.contents) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
610 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
611 def __contains__(self, x): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
612 return x in self.contents |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
613 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
614 def __nonzero__(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
615 "A tag is non-None even if it has no contents." |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
616 return True |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
617 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
618 def __setitem__(self, key, value): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
619 """Setting tag[key] sets the value of the 'key' attribute for the |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
620 tag.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
621 self._getAttrMap() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
622 self.attrMap[key] = value |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
623 found = False |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
624 for i in range(0, len(self.attrs)): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
625 if self.attrs[i][0] == key: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
626 self.attrs[i] = (key, value) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
627 found = True |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
628 if not found: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
629 self.attrs.append((key, value)) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
630 self._getAttrMap()[key] = value |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
631 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
632 def __delitem__(self, key): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
633 "Deleting tag[key] deletes all 'key' attributes for the tag." |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
634 for item in self.attrs: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
635 if item[0] == key: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
636 self.attrs.remove(item) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
637 #We don't break because bad HTML can define the same |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
638 #attribute multiple times. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
639 self._getAttrMap() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
640 if self.attrMap.has_key(key): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
641 del self.attrMap[key] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
642 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
643 def __call__(self, *args, **kwargs): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
644 """Calling a tag like a function is the same as calling its |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
645 findAll() method. Eg. tag('a') returns a list of all the A tags |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
646 found within this tag.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
647 return apply(self.findAll, args, kwargs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
648 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
649 def __getattr__(self, tag): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
650 #print "Getattr %s.%s" % (self.__class__, tag) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
651 if len(tag) > 3 and tag.rfind('Tag') == len(tag)-3: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
652 return self.find(tag[:-3]) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
653 elif tag.find('__') != 0: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
654 return self.find(tag) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
655 raise AttributeError, "'%s' object has no attribute '%s'" % (self.__class__, tag) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
656 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
657 def __eq__(self, other): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
658 """Returns true iff this tag has the same name, the same attributes, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
659 and the same contents (recursively) as the given tag. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
660 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
661 NOTE: right now this will return false if two tags have the |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
662 same attributes in a different order. Should this be fixed?""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
663 if other is self: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
664 return True |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
665 if not hasattr(other, 'name') or not hasattr(other, 'attrs') or not hasattr(other, 'contents') or self.name != other.name or self.attrs != other.attrs or len(self) != len(other): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
666 return False |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
667 for i in range(0, len(self.contents)): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
668 if self.contents[i] != other.contents[i]: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
669 return False |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
670 return True |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
671 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
672 def __ne__(self, other): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
673 """Returns true iff this tag is not identical to the other tag, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
674 as defined in __eq__.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
675 return not self == other |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
676 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
677 def __repr__(self, encoding=DEFAULT_OUTPUT_ENCODING): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
678 """Renders this tag as a string.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
679 return self.__str__(encoding) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
680 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
681 def __unicode__(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
682 return self.__str__(None) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
683 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
684 BARE_AMPERSAND_OR_BRACKET = re.compile("([<>]|" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
685 + "&(?!#\d+;|#x[0-9a-fA-F]+;|\w+;)" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
686 + ")") |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
687 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
688 def _sub_entity(self, x): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
689 """Used with a regular expression to substitute the |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
690 appropriate XML entity for an XML special character.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
691 return "&" + self.XML_SPECIAL_CHARS_TO_ENTITIES[x.group(0)[0]] + ";" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
692 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
693 def __str__(self, encoding=DEFAULT_OUTPUT_ENCODING, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
694 prettyPrint=False, indentLevel=0): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
695 """Returns a string or Unicode representation of this tag and |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
696 its contents. To get Unicode, pass None for encoding. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
697 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
698 NOTE: since Python's HTML parser consumes whitespace, this |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
699 method is not certain to reproduce the whitespace present in |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
700 the original string.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
701 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
702 encodedName = self.toEncoding(self.name, encoding) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
703 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
704 attrs = [] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
705 if self.attrs: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
706 for key, val in self.attrs: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
707 fmt = '%s="%s"' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
708 if isinstance(val, basestring): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
709 if self.containsSubstitutions and '%SOUP-ENCODING%' in val: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
710 val = self.substituteEncoding(val, encoding) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
711 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
712 # The attribute value either: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
713 # |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
714 # * Contains no embedded double quotes or single quotes. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
715 # No problem: we enclose it in double quotes. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
716 # * Contains embedded single quotes. No problem: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
717 # double quotes work here too. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
718 # * Contains embedded double quotes. No problem: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
719 # we enclose it in single quotes. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
720 # * Embeds both single _and_ double quotes. This |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
721 # can't happen naturally, but it can happen if |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
722 # you modify an attribute value after parsing |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
723 # the document. Now we have a bit of a |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
724 # problem. We solve it by enclosing the |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
725 # attribute in single quotes, and escaping any |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
726 # embedded single quotes to XML entities. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
727 if '"' in val: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
728 fmt = "%s='%s'" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
729 if "'" in val: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
730 # TODO: replace with apos when |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
731 # appropriate. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
732 val = val.replace("'", "&squot;") |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
733 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
734 # Now we're okay w/r/t quotes. But the attribute |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
735 # value might also contain angle brackets, or |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
736 # ampersands that aren't part of entities. We need |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
737 # to escape those to XML entities too. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
738 val = self.BARE_AMPERSAND_OR_BRACKET.sub(self._sub_entity, val) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
739 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
740 attrs.append(fmt % (self.toEncoding(key, encoding), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
741 self.toEncoding(val, encoding))) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
742 close = '' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
743 closeTag = '' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
744 if self.isSelfClosing: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
745 close = ' /' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
746 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
747 closeTag = '</%s>' % encodedName |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
748 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
749 indentTag, indentContents = 0, 0 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
750 if prettyPrint: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
751 indentTag = indentLevel |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
752 space = (' ' * (indentTag-1)) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
753 indentContents = indentTag + 1 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
754 contents = self.renderContents(encoding, prettyPrint, indentContents) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
755 if self.hidden: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
756 s = contents |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
757 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
758 s = [] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
759 attributeString = '' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
760 if attrs: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
761 attributeString = ' ' + ' '.join(attrs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
762 if prettyPrint: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
763 s.append(space) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
764 s.append('<%s%s%s>' % (encodedName, attributeString, close)) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
765 if prettyPrint: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
766 s.append("\n") |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
767 s.append(contents) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
768 if prettyPrint and contents and contents[-1] != "\n": |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
769 s.append("\n") |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
770 if prettyPrint and closeTag: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
771 s.append(space) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
772 s.append(closeTag) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
773 if prettyPrint and closeTag and self.nextSibling: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
774 s.append("\n") |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
775 s = ''.join(s) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
776 return s |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
777 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
778 def decompose(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
779 """Recursively destroys the contents of this tree.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
780 self.extract() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
781 if len(self.contents) == 0: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
782 return |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
783 current = self.contents[0] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
784 while current is not None: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
785 next = current.next |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
786 if isinstance(current, Tag): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
787 del current.contents[:] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
788 current.parent = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
789 current.previous = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
790 current.previousSibling = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
791 current.next = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
792 current.nextSibling = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
793 current = next |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
794 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
795 def prettify(self, encoding=DEFAULT_OUTPUT_ENCODING): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
796 return self.__str__(encoding, True) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
797 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
798 def renderContents(self, encoding=DEFAULT_OUTPUT_ENCODING, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
799 prettyPrint=False, indentLevel=0): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
800 """Renders the contents of this tag as a string in the given |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
801 encoding. If encoding is None, returns a Unicode string..""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
802 s=[] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
803 for c in self: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
804 text = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
805 if isinstance(c, NavigableString): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
806 text = c.__str__(encoding) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
807 elif isinstance(c, Tag): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
808 s.append(c.__str__(encoding, prettyPrint, indentLevel)) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
809 if text and prettyPrint: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
810 text = text.strip() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
811 if text: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
812 if prettyPrint: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
813 s.append(" " * (indentLevel-1)) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
814 s.append(text) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
815 if prettyPrint: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
816 s.append("\n") |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
817 return ''.join(s) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
818 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
819 #Soup methods |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
820 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
821 def find(self, name=None, attrs={}, recursive=True, text=None, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
822 **kwargs): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
823 """Return only the first child of this Tag matching the given |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
824 criteria.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
825 r = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
826 l = self.findAll(name, attrs, recursive, text, 1, **kwargs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
827 if l: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
828 r = l[0] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
829 return r |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
830 findChild = find |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
831 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
832 def findAll(self, name=None, attrs={}, recursive=True, text=None, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
833 limit=None, **kwargs): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
834 """Extracts a list of Tag objects that match the given |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
835 criteria. You can specify the name of the Tag and any |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
836 attributes you want the Tag to have. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
837 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
838 The value of a key-value pair in the 'attrs' map can be a |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
839 string, a list of strings, a regular expression object, or a |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
840 callable that takes a string and returns whether or not the |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
841 string matches for some custom definition of 'matches'. The |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
842 same is true of the tag name.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
843 generator = self.recursiveChildGenerator |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
844 if not recursive: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
845 generator = self.childGenerator |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
846 return self._findAll(name, attrs, text, limit, generator, **kwargs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
847 findChildren = findAll |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
848 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
849 # Pre-3.x compatibility methods |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
850 first = find |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
851 fetch = findAll |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
852 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
853 def fetchText(self, text=None, recursive=True, limit=None): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
854 return self.findAll(text=text, recursive=recursive, limit=limit) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
855 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
856 def firstText(self, text=None, recursive=True): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
857 return self.find(text=text, recursive=recursive) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
858 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
859 #Private methods |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
860 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
861 def _getAttrMap(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
862 """Initializes a map representation of this tag's attributes, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
863 if not already initialized.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
864 if not getattr(self, 'attrMap'): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
865 self.attrMap = {} |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
866 for (key, value) in self.attrs: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
867 self.attrMap[key] = value |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
868 return self.attrMap |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
869 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
870 #Generator methods |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
871 def childGenerator(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
872 # Just use the iterator from the contents |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
873 return iter(self.contents) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
874 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
875 def recursiveChildGenerator(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
876 if not len(self.contents): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
877 raise StopIteration |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
878 stopNode = self._lastRecursiveChild().next |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
879 current = self.contents[0] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
880 while current is not stopNode: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
881 yield current |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
882 current = current.next |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
883 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
884 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
885 # Next, a couple classes to represent queries and their results. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
886 class SoupStrainer: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
887 """Encapsulates a number of ways of matching a markup element (tag or |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
888 text).""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
889 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
890 def __init__(self, name=None, attrs={}, text=None, **kwargs): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
891 self.name = name |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
892 if isinstance(attrs, basestring): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
893 kwargs['class'] = _match_css_class(attrs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
894 attrs = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
895 if kwargs: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
896 if attrs: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
897 attrs = attrs.copy() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
898 attrs.update(kwargs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
899 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
900 attrs = kwargs |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
901 self.attrs = attrs |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
902 self.text = text |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
903 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
904 def __str__(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
905 if self.text: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
906 return self.text |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
907 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
908 return "%s|%s" % (self.name, self.attrs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
909 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
910 def searchTag(self, markupName=None, markupAttrs={}): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
911 found = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
912 markup = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
913 if isinstance(markupName, Tag): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
914 markup = markupName |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
915 markupAttrs = markup |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
916 callFunctionWithTagData = callable(self.name) \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
917 and not isinstance(markupName, Tag) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
918 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
919 if (not self.name) \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
920 or callFunctionWithTagData \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
921 or (markup and self._matches(markup, self.name)) \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
922 or (not markup and self._matches(markupName, self.name)): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
923 if callFunctionWithTagData: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
924 match = self.name(markupName, markupAttrs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
925 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
926 match = True |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
927 markupAttrMap = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
928 for attr, matchAgainst in self.attrs.items(): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
929 if not markupAttrMap: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
930 if hasattr(markupAttrs, 'get'): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
931 markupAttrMap = markupAttrs |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
932 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
933 markupAttrMap = {} |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
934 for k,v in markupAttrs: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
935 markupAttrMap[k] = v |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
936 attrValue = markupAttrMap.get(attr) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
937 if not self._matches(attrValue, matchAgainst): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
938 match = False |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
939 break |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
940 if match: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
941 if markup: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
942 found = markup |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
943 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
944 found = markupName |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
945 return found |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
946 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
947 def search(self, markup): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
948 #print 'looking for %s in %s' % (self, markup) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
949 found = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
950 # If given a list of items, scan it for a text element that |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
951 # matches. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
952 if hasattr(markup, "__iter__") \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
953 and not isinstance(markup, Tag): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
954 for element in markup: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
955 if isinstance(element, NavigableString) \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
956 and self.search(element): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
957 found = element |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
958 break |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
959 # If it's a Tag, make sure its name or attributes match. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
960 # Don't bother with Tags if we're searching for text. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
961 elif isinstance(markup, Tag): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
962 if not self.text: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
963 found = self.searchTag(markup) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
964 # If it's text, make sure the text matches. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
965 elif isinstance(markup, NavigableString) or \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
966 isinstance(markup, basestring): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
967 if self._matches(markup, self.text): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
968 found = markup |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
969 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
970 raise Exception, "I don't know how to match against a %s" \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
971 % markup.__class__ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
972 return found |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
973 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
974 def _matches(self, markup, matchAgainst): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
975 #print "Matching %s against %s" % (markup, matchAgainst) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
976 result = False |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
977 if matchAgainst is True: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
978 result = markup is not None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
979 elif callable(matchAgainst): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
980 result = matchAgainst(markup) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
981 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
982 #Custom match methods take the tag as an argument, but all |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
983 #other ways of matching match the tag name as a string. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
984 if isinstance(markup, Tag): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
985 markup = markup.name |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
986 if markup and not isinstance(markup, basestring): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
987 markup = unicode(markup) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
988 #Now we know that chunk is either a string, or None. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
989 if hasattr(matchAgainst, 'match'): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
990 # It's a regexp object. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
991 result = markup and matchAgainst.search(markup) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
992 elif hasattr(matchAgainst, '__iter__'): # list-like |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
993 result = markup in matchAgainst |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
994 elif hasattr(matchAgainst, 'items'): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
995 result = markup.has_key(matchAgainst) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
996 elif matchAgainst and isinstance(markup, basestring): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
997 if isinstance(markup, unicode): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
998 matchAgainst = unicode(matchAgainst) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
999 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1000 matchAgainst = str(matchAgainst) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1001 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1002 if not result: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1003 result = matchAgainst == markup |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1004 return result |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1005 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1006 class ResultSet(list): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1007 """A ResultSet is just a list that keeps track of the SoupStrainer |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1008 that created it.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1009 def __init__(self, source): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1010 list.__init__([]) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1011 self.source = source |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1012 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1013 # Now, some helper functions. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1014 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1015 def buildTagMap(default, *args): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1016 """Turns a list of maps, lists, or scalars into a single map. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1017 Used to build the SELF_CLOSING_TAGS, NESTABLE_TAGS, and |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1018 NESTING_RESET_TAGS maps out of lists and partial maps.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1019 built = {} |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1020 for portion in args: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1021 if hasattr(portion, 'items'): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1022 #It's a map. Merge it. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1023 for k,v in portion.items(): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1024 built[k] = v |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1025 elif hasattr(portion, '__iter__'): # is a list |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1026 #It's a list. Map each item to the default. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1027 for k in portion: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1028 built[k] = default |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1029 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1030 #It's a scalar. Map it to the default. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1031 built[portion] = default |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1032 return built |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1033 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1034 # Now, the parser classes. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1035 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1036 class BeautifulStoneSoup(Tag, SGMLParser): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1037 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1038 """This class contains the basic parser and search code. It defines |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1039 a parser that knows nothing about tag behavior except for the |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1040 following: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1041 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1042 You can't close a tag without closing all the tags it encloses. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1043 That is, "<foo><bar></foo>" actually means |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1044 "<foo><bar></bar></foo>". |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1045 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1046 [Another possible explanation is "<foo><bar /></foo>", but since |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1047 this class defines no SELF_CLOSING_TAGS, it will never use that |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1048 explanation.] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1049 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1050 This class is useful for parsing XML or made-up markup languages, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1051 or when BeautifulSoup makes an assumption counter to what you were |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1052 expecting.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1053 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1054 SELF_CLOSING_TAGS = {} |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1055 NESTABLE_TAGS = {} |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1056 RESET_NESTING_TAGS = {} |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1057 QUOTE_TAGS = {} |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1058 PRESERVE_WHITESPACE_TAGS = [] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1059 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1060 MARKUP_MASSAGE = [(re.compile('(<[^<>]*)/>'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1061 lambda x: x.group(1) + ' />'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1062 (re.compile('<!\s+([^<>]*)>'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1063 lambda x: '<!' + x.group(1) + '>') |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1064 ] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1065 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1066 ROOT_TAG_NAME = u'[document]' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1067 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1068 HTML_ENTITIES = "html" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1069 XML_ENTITIES = "xml" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1070 XHTML_ENTITIES = "xhtml" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1071 # TODO: This only exists for backwards-compatibility |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1072 ALL_ENTITIES = XHTML_ENTITIES |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1073 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1074 # Used when determining whether a text node is all whitespace and |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1075 # can be replaced with a single space. A text node that contains |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1076 # fancy Unicode spaces (usually non-breaking) should be left |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1077 # alone. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1078 STRIP_ASCII_SPACES = { 9: None, 10: None, 12: None, 13: None, 32: None, } |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1079 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1080 def __init__(self, markup="", parseOnlyThese=None, fromEncoding=None, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1081 markupMassage=True, smartQuotesTo=XML_ENTITIES, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1082 convertEntities=None, selfClosingTags=None, isHTML=False): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1083 """The Soup object is initialized as the 'root tag', and the |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1084 provided markup (which can be a string or a file-like object) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1085 is fed into the underlying parser. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1086 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1087 sgmllib will process most bad HTML, and the BeautifulSoup |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1088 class has some tricks for dealing with some HTML that kills |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1089 sgmllib, but Beautiful Soup can nonetheless choke or lose data |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1090 if your data uses self-closing tags or declarations |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1091 incorrectly. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1092 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1093 By default, Beautiful Soup uses regexes to sanitize input, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1094 avoiding the vast majority of these problems. If the problems |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1095 don't apply to you, pass in False for markupMassage, and |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1096 you'll get better performance. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1097 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1098 The default parser massage techniques fix the two most common |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1099 instances of invalid HTML that choke sgmllib: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1100 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1101 <br/> (No space between name of closing tag and tag close) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1102 <! --Comment--> (Extraneous whitespace in declaration) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1103 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1104 You can pass in a custom list of (RE object, replace method) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1105 tuples to get Beautiful Soup to scrub your input the way you |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1106 want.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1107 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1108 self.parseOnlyThese = parseOnlyThese |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1109 self.fromEncoding = fromEncoding |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1110 self.smartQuotesTo = smartQuotesTo |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1111 self.convertEntities = convertEntities |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1112 # Set the rules for how we'll deal with the entities we |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1113 # encounter |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1114 if self.convertEntities: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1115 # It doesn't make sense to convert encoded characters to |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1116 # entities even while you're converting entities to Unicode. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1117 # Just convert it all to Unicode. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1118 self.smartQuotesTo = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1119 if convertEntities == self.HTML_ENTITIES: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1120 self.convertXMLEntities = False |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1121 self.convertHTMLEntities = True |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1122 self.escapeUnrecognizedEntities = True |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1123 elif convertEntities == self.XHTML_ENTITIES: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1124 self.convertXMLEntities = True |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1125 self.convertHTMLEntities = True |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1126 self.escapeUnrecognizedEntities = False |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1127 elif convertEntities == self.XML_ENTITIES: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1128 self.convertXMLEntities = True |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1129 self.convertHTMLEntities = False |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1130 self.escapeUnrecognizedEntities = False |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1131 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1132 self.convertXMLEntities = False |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1133 self.convertHTMLEntities = False |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1134 self.escapeUnrecognizedEntities = False |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1135 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1136 self.instanceSelfClosingTags = buildTagMap(None, selfClosingTags) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1137 SGMLParser.__init__(self) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1138 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1139 if hasattr(markup, 'read'): # It's a file-type object. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1140 markup = markup.read() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1141 self.markup = markup |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1142 self.markupMassage = markupMassage |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1143 try: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1144 self._feed(isHTML=isHTML) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1145 except StopParsing: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1146 pass |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1147 self.markup = None # The markup can now be GCed |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1148 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1149 def convert_charref(self, name): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1150 """This method fixes a bug in Python's SGMLParser.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1151 try: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1152 n = int(name) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1153 except ValueError: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1154 return |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1155 if not 0 <= n <= 127 : # ASCII ends at 127, not 255 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1156 return |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1157 return self.convert_codepoint(n) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1158 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1159 def _feed(self, inDocumentEncoding=None, isHTML=False): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1160 # Convert the document to Unicode. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1161 markup = self.markup |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1162 if isinstance(markup, unicode): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1163 if not hasattr(self, 'originalEncoding'): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1164 self.originalEncoding = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1165 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1166 dammit = UnicodeDammit\ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1167 (markup, [self.fromEncoding, inDocumentEncoding], |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1168 smartQuotesTo=self.smartQuotesTo, isHTML=isHTML) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1169 markup = dammit.unicode |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1170 self.originalEncoding = dammit.originalEncoding |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1171 self.declaredHTMLEncoding = dammit.declaredHTMLEncoding |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1172 if markup: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1173 if self.markupMassage: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1174 if not hasattr(self.markupMassage, "__iter__"): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1175 self.markupMassage = self.MARKUP_MASSAGE |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1176 for fix, m in self.markupMassage: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1177 markup = fix.sub(m, markup) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1178 # TODO: We get rid of markupMassage so that the |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1179 # soup object can be deepcopied later on. Some |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1180 # Python installations can't copy regexes. If anyone |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1181 # was relying on the existence of markupMassage, this |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1182 # might cause problems. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1183 del(self.markupMassage) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1184 self.reset() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1185 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1186 SGMLParser.feed(self, markup) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1187 # Close out any unfinished strings and close all the open tags. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1188 self.endData() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1189 while self.currentTag.name != self.ROOT_TAG_NAME: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1190 self.popTag() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1191 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1192 def __getattr__(self, methodName): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1193 """This method routes method call requests to either the SGMLParser |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1194 superclass or the Tag superclass, depending on the method name.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1195 #print "__getattr__ called on %s.%s" % (self.__class__, methodName) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1196 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1197 if methodName.startswith('start_') or methodName.startswith('end_') \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1198 or methodName.startswith('do_'): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1199 return SGMLParser.__getattr__(self, methodName) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1200 elif not methodName.startswith('__'): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1201 return Tag.__getattr__(self, methodName) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1202 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1203 raise AttributeError |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1204 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1205 def isSelfClosingTag(self, name): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1206 """Returns true iff the given string is the name of a |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1207 self-closing tag according to this parser.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1208 return self.SELF_CLOSING_TAGS.has_key(name) \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1209 or self.instanceSelfClosingTags.has_key(name) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1210 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1211 def reset(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1212 Tag.__init__(self, self, self.ROOT_TAG_NAME) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1213 self.hidden = 1 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1214 SGMLParser.reset(self) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1215 self.currentData = [] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1216 self.currentTag = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1217 self.tagStack = [] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1218 self.quoteStack = [] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1219 self.pushTag(self) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1220 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1221 def popTag(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1222 tag = self.tagStack.pop() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1223 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1224 #print "Pop", tag.name |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1225 if self.tagStack: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1226 self.currentTag = self.tagStack[-1] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1227 return self.currentTag |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1228 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1229 def pushTag(self, tag): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1230 #print "Push", tag.name |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1231 if self.currentTag: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1232 self.currentTag.contents.append(tag) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1233 self.tagStack.append(tag) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1234 self.currentTag = self.tagStack[-1] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1235 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1236 def endData(self, containerClass=NavigableString): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1237 if self.currentData: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1238 currentData = u''.join(self.currentData) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1239 if (currentData.translate(self.STRIP_ASCII_SPACES) == '' and |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1240 not set([tag.name for tag in self.tagStack]).intersection( |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1241 self.PRESERVE_WHITESPACE_TAGS)): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1242 if '\n' in currentData: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1243 currentData = '\n' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1244 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1245 currentData = ' ' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1246 self.currentData = [] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1247 if self.parseOnlyThese and len(self.tagStack) <= 1 and \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1248 (not self.parseOnlyThese.text or \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1249 not self.parseOnlyThese.search(currentData)): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1250 return |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1251 o = containerClass(currentData) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1252 o.setup(self.currentTag, self.previous) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1253 if self.previous: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1254 self.previous.next = o |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1255 self.previous = o |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1256 self.currentTag.contents.append(o) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1257 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1258 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1259 def _popToTag(self, name, inclusivePop=True): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1260 """Pops the tag stack up to and including the most recent |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1261 instance of the given tag. If inclusivePop is false, pops the tag |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1262 stack up to but *not* including the most recent instqance of |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1263 the given tag.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1264 #print "Popping to %s" % name |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1265 if name == self.ROOT_TAG_NAME: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1266 return |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1267 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1268 numPops = 0 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1269 mostRecentTag = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1270 for i in range(len(self.tagStack)-1, 0, -1): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1271 if name == self.tagStack[i].name: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1272 numPops = len(self.tagStack)-i |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1273 break |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1274 if not inclusivePop: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1275 numPops = numPops - 1 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1276 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1277 for i in range(0, numPops): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1278 mostRecentTag = self.popTag() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1279 return mostRecentTag |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1280 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1281 def _smartPop(self, name): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1282 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1283 """We need to pop up to the previous tag of this type, unless |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1284 one of this tag's nesting reset triggers comes between this |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1285 tag and the previous tag of this type, OR unless this tag is a |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1286 generic nesting trigger and another generic nesting trigger |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1287 comes between this tag and the previous tag of this type. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1288 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1289 Examples: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1290 <p>Foo<b>Bar *<p>* should pop to 'p', not 'b'. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1291 <p>Foo<table>Bar *<p>* should pop to 'table', not 'p'. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1292 <p>Foo<table><tr>Bar *<p>* should pop to 'tr', not 'p'. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1293 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1294 <li><ul><li> *<li>* should pop to 'ul', not the first 'li'. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1295 <tr><table><tr> *<tr>* should pop to 'table', not the first 'tr' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1296 <td><tr><td> *<td>* should pop to 'tr', not the first 'td' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1297 """ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1298 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1299 nestingResetTriggers = self.NESTABLE_TAGS.get(name) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1300 isNestable = nestingResetTriggers != None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1301 isResetNesting = self.RESET_NESTING_TAGS.has_key(name) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1302 popTo = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1303 inclusive = True |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1304 for i in range(len(self.tagStack)-1, 0, -1): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1305 p = self.tagStack[i] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1306 if (not p or p.name == name) and not isNestable: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1307 #Non-nestable tags get popped to the top or to their |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1308 #last occurance. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1309 popTo = name |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1310 break |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1311 if (nestingResetTriggers is not None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1312 and p.name in nestingResetTriggers) \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1313 or (nestingResetTriggers is None and isResetNesting |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1314 and self.RESET_NESTING_TAGS.has_key(p.name)): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1315 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1316 #If we encounter one of the nesting reset triggers |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1317 #peculiar to this tag, or we encounter another tag |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1318 #that causes nesting to reset, pop up to but not |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1319 #including that tag. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1320 popTo = p.name |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1321 inclusive = False |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1322 break |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1323 p = p.parent |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1324 if popTo: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1325 self._popToTag(popTo, inclusive) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1326 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1327 def unknown_starttag(self, name, attrs, selfClosing=0): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1328 #print "Start tag %s: %s" % (name, attrs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1329 if self.quoteStack: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1330 #This is not a real tag. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1331 #print "<%s> is not real!" % name |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1332 attrs = ''.join([' %s="%s"' % (x, y) for x, y in attrs]) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1333 self.handle_data('<%s%s>' % (name, attrs)) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1334 return |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1335 self.endData() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1336 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1337 if not self.isSelfClosingTag(name) and not selfClosing: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1338 self._smartPop(name) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1339 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1340 if self.parseOnlyThese and len(self.tagStack) <= 1 \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1341 and (self.parseOnlyThese.text or not self.parseOnlyThese.searchTag(name, attrs)): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1342 return |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1343 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1344 tag = Tag(self, name, attrs, self.currentTag, self.previous) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1345 if self.previous: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1346 self.previous.next = tag |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1347 self.previous = tag |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1348 self.pushTag(tag) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1349 if selfClosing or self.isSelfClosingTag(name): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1350 self.popTag() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1351 if name in self.QUOTE_TAGS: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1352 #print "Beginning quote (%s)" % name |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1353 self.quoteStack.append(name) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1354 self.literal = 1 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1355 return tag |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1356 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1357 def unknown_endtag(self, name): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1358 #print "End tag %s" % name |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1359 if self.quoteStack and self.quoteStack[-1] != name: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1360 #This is not a real end tag. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1361 #print "</%s> is not real!" % name |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1362 self.handle_data('</%s>' % name) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1363 return |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1364 self.endData() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1365 self._popToTag(name) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1366 if self.quoteStack and self.quoteStack[-1] == name: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1367 self.quoteStack.pop() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1368 self.literal = (len(self.quoteStack) > 0) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1369 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1370 def handle_data(self, data): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1371 self.currentData.append(data) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1372 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1373 def _toStringSubclass(self, text, subclass): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1374 """Adds a certain piece of text to the tree as a NavigableString |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1375 subclass.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1376 self.endData() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1377 self.handle_data(text) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1378 self.endData(subclass) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1379 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1380 def handle_pi(self, text): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1381 """Handle a processing instruction as a ProcessingInstruction |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1382 object, possibly one with a %SOUP-ENCODING% slot into which an |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1383 encoding will be plugged later.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1384 if text[:3] == "xml": |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1385 text = u"xml version='1.0' encoding='%SOUP-ENCODING%'" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1386 self._toStringSubclass(text, ProcessingInstruction) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1387 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1388 def handle_comment(self, text): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1389 "Handle comments as Comment objects." |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1390 self._toStringSubclass(text, Comment) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1391 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1392 def handle_charref(self, ref): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1393 "Handle character references as data." |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1394 if self.convertEntities: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1395 data = unichr(int(ref)) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1396 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1397 data = '&#%s;' % ref |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1398 self.handle_data(data) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1399 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1400 def handle_entityref(self, ref): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1401 """Handle entity references as data, possibly converting known |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1402 HTML and/or XML entity references to the corresponding Unicode |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1403 characters.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1404 data = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1405 if self.convertHTMLEntities: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1406 try: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1407 data = unichr(name2codepoint[ref]) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1408 except KeyError: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1409 pass |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1410 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1411 if not data and self.convertXMLEntities: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1412 data = self.XML_ENTITIES_TO_SPECIAL_CHARS.get(ref) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1413 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1414 if not data and self.convertHTMLEntities and \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1415 not self.XML_ENTITIES_TO_SPECIAL_CHARS.get(ref): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1416 # TODO: We've got a problem here. We're told this is |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1417 # an entity reference, but it's not an XML entity |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1418 # reference or an HTML entity reference. Nonetheless, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1419 # the logical thing to do is to pass it through as an |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1420 # unrecognized entity reference. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1421 # |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1422 # Except: when the input is "&carol;" this function |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1423 # will be called with input "carol". When the input is |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1424 # "AT&T", this function will be called with input |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1425 # "T". We have no way of knowing whether a semicolon |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1426 # was present originally, so we don't know whether |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1427 # this is an unknown entity or just a misplaced |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1428 # ampersand. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1429 # |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1430 # The more common case is a misplaced ampersand, so I |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1431 # escape the ampersand and omit the trailing semicolon. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1432 data = "&%s" % ref |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1433 if not data: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1434 # This case is different from the one above, because we |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1435 # haven't already gone through a supposedly comprehensive |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1436 # mapping of entities to Unicode characters. We might not |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1437 # have gone through any mapping at all. So the chances are |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1438 # very high that this is a real entity, and not a |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1439 # misplaced ampersand. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1440 data = "&%s;" % ref |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1441 self.handle_data(data) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1442 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1443 def handle_decl(self, data): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1444 "Handle DOCTYPEs and the like as Declaration objects." |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1445 self._toStringSubclass(data, Declaration) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1446 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1447 def parse_declaration(self, i): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1448 """Treat a bogus SGML declaration as raw data. Treat a CDATA |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1449 declaration as a CData object.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1450 j = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1451 if self.rawdata[i:i+9] == '<![CDATA[': |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1452 k = self.rawdata.find(']]>', i) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1453 if k == -1: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1454 k = len(self.rawdata) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1455 data = self.rawdata[i+9:k] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1456 j = k+3 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1457 self._toStringSubclass(data, CData) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1458 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1459 try: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1460 j = SGMLParser.parse_declaration(self, i) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1461 except SGMLParseError: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1462 toHandle = self.rawdata[i:] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1463 self.handle_data(toHandle) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1464 j = i + len(toHandle) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1465 return j |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1466 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1467 class BeautifulSoup(BeautifulStoneSoup): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1468 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1469 """This parser knows the following facts about HTML: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1470 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1471 * Some tags have no closing tag and should be interpreted as being |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1472 closed as soon as they are encountered. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1473 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1474 * The text inside some tags (ie. 'script') may contain tags which |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1475 are not really part of the document and which should be parsed |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1476 as text, not tags. If you want to parse the text as tags, you can |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1477 always fetch it and parse it explicitly. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1478 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1479 * Tag nesting rules: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1480 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1481 Most tags can't be nested at all. For instance, the occurance of |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1482 a <p> tag should implicitly close the previous <p> tag. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1483 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1484 <p>Para1<p>Para2 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1485 should be transformed into: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1486 <p>Para1</p><p>Para2 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1487 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1488 Some tags can be nested arbitrarily. For instance, the occurance |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1489 of a <blockquote> tag should _not_ implicitly close the previous |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1490 <blockquote> tag. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1491 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1492 Alice said: <blockquote>Bob said: <blockquote>Blah |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1493 should NOT be transformed into: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1494 Alice said: <blockquote>Bob said: </blockquote><blockquote>Blah |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1495 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1496 Some tags can be nested, but the nesting is reset by the |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1497 interposition of other tags. For instance, a <tr> tag should |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1498 implicitly close the previous <tr> tag within the same <table>, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1499 but not close a <tr> tag in another table. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1500 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1501 <table><tr>Blah<tr>Blah |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1502 should be transformed into: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1503 <table><tr>Blah</tr><tr>Blah |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1504 but, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1505 <tr>Blah<table><tr>Blah |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1506 should NOT be transformed into |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1507 <tr>Blah<table></tr><tr>Blah |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1508 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1509 Differing assumptions about tag nesting rules are a major source |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1510 of problems with the BeautifulSoup class. If BeautifulSoup is not |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1511 treating as nestable a tag your page author treats as nestable, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1512 try ICantBelieveItsBeautifulSoup, MinimalSoup, or |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1513 BeautifulStoneSoup before writing your own subclass.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1514 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1515 def __init__(self, *args, **kwargs): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1516 if not kwargs.has_key('smartQuotesTo'): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1517 kwargs['smartQuotesTo'] = self.HTML_ENTITIES |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1518 kwargs['isHTML'] = True |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1519 BeautifulStoneSoup.__init__(self, *args, **kwargs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1520 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1521 SELF_CLOSING_TAGS = buildTagMap(None, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1522 ('br' , 'hr', 'input', 'img', 'meta', |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1523 'spacer', 'link', 'frame', 'base', 'col')) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1524 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1525 PRESERVE_WHITESPACE_TAGS = set(['pre', 'textarea']) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1526 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1527 QUOTE_TAGS = {'script' : None, 'textarea' : None} |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1528 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1529 #According to the HTML standard, each of these inline tags can |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1530 #contain another tag of the same type. Furthermore, it's common |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1531 #to actually use these tags this way. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1532 NESTABLE_INLINE_TAGS = ('span', 'font', 'q', 'object', 'bdo', 'sub', 'sup', |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1533 'center') |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1534 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1535 #According to the HTML standard, these block tags can contain |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1536 #another tag of the same type. Furthermore, it's common |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1537 #to actually use these tags this way. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1538 NESTABLE_BLOCK_TAGS = ('blockquote', 'div', 'fieldset', 'ins', 'del') |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1539 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1540 #Lists can contain other lists, but there are restrictions. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1541 NESTABLE_LIST_TAGS = { 'ol' : [], |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1542 'ul' : [], |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1543 'li' : ['ul', 'ol'], |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1544 'dl' : [], |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1545 'dd' : ['dl'], |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1546 'dt' : ['dl'] } |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1547 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1548 #Tables can contain other tables, but there are restrictions. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1549 NESTABLE_TABLE_TAGS = {'table' : [], |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1550 'tr' : ['table', 'tbody', 'tfoot', 'thead'], |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1551 'td' : ['tr'], |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1552 'th' : ['tr'], |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1553 'thead' : ['table'], |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1554 'tbody' : ['table'], |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1555 'tfoot' : ['table'], |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1556 } |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1557 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1558 NON_NESTABLE_BLOCK_TAGS = ('address', 'form', 'p', 'pre') |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1559 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1560 #If one of these tags is encountered, all tags up to the next tag of |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1561 #this type are popped. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1562 RESET_NESTING_TAGS = buildTagMap(None, NESTABLE_BLOCK_TAGS, 'noscript', |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1563 NON_NESTABLE_BLOCK_TAGS, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1564 NESTABLE_LIST_TAGS, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1565 NESTABLE_TABLE_TAGS) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1566 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1567 NESTABLE_TAGS = buildTagMap([], NESTABLE_INLINE_TAGS, NESTABLE_BLOCK_TAGS, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1568 NESTABLE_LIST_TAGS, NESTABLE_TABLE_TAGS) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1569 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1570 # Used to detect the charset in a META tag; see start_meta |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1571 CHARSET_RE = re.compile("((^|;)\s*charset=)([^;]*)", re.M) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1572 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1573 def start_meta(self, attrs): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1574 """Beautiful Soup can detect a charset included in a META tag, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1575 try to convert the document to that charset, and re-parse the |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1576 document from the beginning.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1577 httpEquiv = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1578 contentType = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1579 contentTypeIndex = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1580 tagNeedsEncodingSubstitution = False |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1581 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1582 for i in range(0, len(attrs)): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1583 key, value = attrs[i] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1584 key = key.lower() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1585 if key == 'http-equiv': |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1586 httpEquiv = value |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1587 elif key == 'content': |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1588 contentType = value |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1589 contentTypeIndex = i |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1590 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1591 if httpEquiv and contentType: # It's an interesting meta tag. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1592 match = self.CHARSET_RE.search(contentType) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1593 if match: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1594 if (self.declaredHTMLEncoding is not None or |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1595 self.originalEncoding == self.fromEncoding): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1596 # An HTML encoding was sniffed while converting |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1597 # the document to Unicode, or an HTML encoding was |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1598 # sniffed during a previous pass through the |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1599 # document, or an encoding was specified |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1600 # explicitly and it worked. Rewrite the meta tag. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1601 def rewrite(match): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1602 return match.group(1) + "%SOUP-ENCODING%" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1603 newAttr = self.CHARSET_RE.sub(rewrite, contentType) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1604 attrs[contentTypeIndex] = (attrs[contentTypeIndex][0], |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1605 newAttr) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1606 tagNeedsEncodingSubstitution = True |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1607 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1608 # This is our first pass through the document. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1609 # Go through it again with the encoding information. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1610 newCharset = match.group(3) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1611 if newCharset and newCharset != self.originalEncoding: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1612 self.declaredHTMLEncoding = newCharset |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1613 self._feed(self.declaredHTMLEncoding) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1614 raise StopParsing |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1615 pass |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1616 tag = self.unknown_starttag("meta", attrs) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1617 if tag and tagNeedsEncodingSubstitution: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1618 tag.containsSubstitutions = True |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1619 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1620 class StopParsing(Exception): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1621 pass |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1622 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1623 class ICantBelieveItsBeautifulSoup(BeautifulSoup): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1624 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1625 """The BeautifulSoup class is oriented towards skipping over |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1626 common HTML errors like unclosed tags. However, sometimes it makes |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1627 errors of its own. For instance, consider this fragment: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1628 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1629 <b>Foo<b>Bar</b></b> |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1630 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1631 This is perfectly valid (if bizarre) HTML. However, the |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1632 BeautifulSoup class will implicitly close the first b tag when it |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1633 encounters the second 'b'. It will think the author wrote |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1634 "<b>Foo<b>Bar", and didn't close the first 'b' tag, because |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1635 there's no real-world reason to bold something that's already |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1636 bold. When it encounters '</b></b>' it will close two more 'b' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1637 tags, for a grand total of three tags closed instead of two. This |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1638 can throw off the rest of your document structure. The same is |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1639 true of a number of other tags, listed below. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1640 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1641 It's much more common for someone to forget to close a 'b' tag |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1642 than to actually use nested 'b' tags, and the BeautifulSoup class |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1643 handles the common case. This class handles the not-co-common |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1644 case: where you can't believe someone wrote what they did, but |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1645 it's valid HTML and BeautifulSoup screwed up by assuming it |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1646 wouldn't be.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1647 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1648 I_CANT_BELIEVE_THEYRE_NESTABLE_INLINE_TAGS = \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1649 ('em', 'big', 'i', 'small', 'tt', 'abbr', 'acronym', 'strong', |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1650 'cite', 'code', 'dfn', 'kbd', 'samp', 'strong', 'var', 'b', |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1651 'big') |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1652 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1653 I_CANT_BELIEVE_THEYRE_NESTABLE_BLOCK_TAGS = ('noscript',) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1654 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1655 NESTABLE_TAGS = buildTagMap([], BeautifulSoup.NESTABLE_TAGS, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1656 I_CANT_BELIEVE_THEYRE_NESTABLE_BLOCK_TAGS, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1657 I_CANT_BELIEVE_THEYRE_NESTABLE_INLINE_TAGS) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1658 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1659 class MinimalSoup(BeautifulSoup): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1660 """The MinimalSoup class is for parsing HTML that contains |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1661 pathologically bad markup. It makes no assumptions about tag |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1662 nesting, but it does know which tags are self-closing, that |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1663 <script> tags contain Javascript and should not be parsed, that |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1664 META tags may contain encoding information, and so on. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1665 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1666 This also makes it better for subclassing than BeautifulStoneSoup |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1667 or BeautifulSoup.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1668 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1669 RESET_NESTING_TAGS = buildTagMap('noscript') |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1670 NESTABLE_TAGS = {} |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1671 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1672 class BeautifulSOAP(BeautifulStoneSoup): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1673 """This class will push a tag with only a single string child into |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1674 the tag's parent as an attribute. The attribute's name is the tag |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1675 name, and the value is the string child. An example should give |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1676 the flavor of the change: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1677 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1678 <foo><bar>baz</bar></foo> |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1679 => |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1680 <foo bar="baz"><bar>baz</bar></foo> |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1681 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1682 You can then access fooTag['bar'] instead of fooTag.barTag.string. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1683 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1684 This is, of course, useful for scraping structures that tend to |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1685 use subelements instead of attributes, such as SOAP messages. Note |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1686 that it modifies its input, so don't print the modified version |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1687 out. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1688 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1689 I'm not sure how many people really want to use this class; let me |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1690 know if you do. Mainly I like the name.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1691 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1692 def popTag(self): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1693 if len(self.tagStack) > 1: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1694 tag = self.tagStack[-1] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1695 parent = self.tagStack[-2] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1696 parent._getAttrMap() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1697 if (isinstance(tag, Tag) and len(tag.contents) == 1 and |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1698 isinstance(tag.contents[0], NavigableString) and |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1699 not parent.attrMap.has_key(tag.name)): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1700 parent[tag.name] = tag.contents[0] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1701 BeautifulStoneSoup.popTag(self) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1702 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1703 #Enterprise class names! It has come to our attention that some people |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1704 #think the names of the Beautiful Soup parser classes are too silly |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1705 #and "unprofessional" for use in enterprise screen-scraping. We feel |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1706 #your pain! For such-minded folk, the Beautiful Soup Consortium And |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1707 #All-Night Kosher Bakery recommends renaming this file to |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1708 #"RobustParser.py" (or, in cases of extreme enterprisiness, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1709 #"RobustParserBeanInterface.class") and using the following |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1710 #enterprise-friendly class aliases: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1711 class RobustXMLParser(BeautifulStoneSoup): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1712 pass |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1713 class RobustHTMLParser(BeautifulSoup): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1714 pass |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1715 class RobustWackAssHTMLParser(ICantBelieveItsBeautifulSoup): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1716 pass |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1717 class RobustInsanelyWackAssHTMLParser(MinimalSoup): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1718 pass |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1719 class SimplifyingSOAPParser(BeautifulSOAP): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1720 pass |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1721 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1722 ###################################################### |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1723 # |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1724 # Bonus library: Unicode, Dammit |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1725 # |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1726 # This class forces XML data into a standard format (usually to UTF-8 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1727 # or Unicode). It is heavily based on code from Mark Pilgrim's |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1728 # Universal Feed Parser. It does not rewrite the XML or HTML to |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1729 # reflect a new encoding: that happens in BeautifulStoneSoup.handle_pi |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1730 # (XML) and BeautifulSoup.start_meta (HTML). |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1731 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1732 # Autodetects character encodings. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1733 # Download from http://chardet.feedparser.org/ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1734 try: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1735 import chardet |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1736 # import chardet.constants |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1737 # chardet.constants._debug = 1 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1738 except ImportError: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1739 chardet = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1740 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1741 # cjkcodecs and iconv_codec make Python know about more character encodings. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1742 # Both are available from http://cjkpython.i18n.org/ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1743 # They're built in if you use Python 2.4. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1744 try: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1745 import cjkcodecs.aliases |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1746 except ImportError: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1747 pass |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1748 try: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1749 import iconv_codec |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1750 except ImportError: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1751 pass |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1752 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1753 class UnicodeDammit: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1754 """A class for detecting the encoding of a *ML document and |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1755 converting it to a Unicode string. If the source encoding is |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1756 windows-1252, can replace MS smart quotes with their HTML or XML |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1757 equivalents.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1758 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1759 # This dictionary maps commonly seen values for "charset" in HTML |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1760 # meta tags to the corresponding Python codec names. It only covers |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1761 # values that aren't in Python's aliases and can't be determined |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1762 # by the heuristics in find_codec. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1763 CHARSET_ALIASES = { "macintosh" : "mac-roman", |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1764 "x-sjis" : "shift-jis" } |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1765 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1766 def __init__(self, markup, overrideEncodings=[], |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1767 smartQuotesTo='xml', isHTML=False): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1768 self.declaredHTMLEncoding = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1769 self.markup, documentEncoding, sniffedEncoding = \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1770 self._detectEncoding(markup, isHTML) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1771 self.smartQuotesTo = smartQuotesTo |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1772 self.triedEncodings = [] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1773 if markup == '' or isinstance(markup, unicode): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1774 self.originalEncoding = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1775 self.unicode = unicode(markup) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1776 return |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1777 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1778 u = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1779 for proposedEncoding in overrideEncodings: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1780 u = self._convertFrom(proposedEncoding) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1781 if u: break |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1782 if not u: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1783 for proposedEncoding in (documentEncoding, sniffedEncoding): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1784 u = self._convertFrom(proposedEncoding) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1785 if u: break |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1786 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1787 # If no luck and we have auto-detection library, try that: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1788 if not u and chardet and not isinstance(self.markup, unicode): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1789 u = self._convertFrom(chardet.detect(self.markup)['encoding']) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1790 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1791 # As a last resort, try utf-8 and windows-1252: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1792 if not u: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1793 for proposed_encoding in ("utf-8", "windows-1252"): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1794 u = self._convertFrom(proposed_encoding) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1795 if u: break |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1796 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1797 self.unicode = u |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1798 if not u: self.originalEncoding = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1799 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1800 def _subMSChar(self, orig): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1801 """Changes a MS smart quote character to an XML or HTML |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1802 entity.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1803 sub = self.MS_CHARS.get(orig) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1804 if isinstance(sub, tuple): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1805 if self.smartQuotesTo == 'xml': |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1806 sub = '&#x%s;' % sub[1] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1807 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1808 sub = '&%s;' % sub[0] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1809 return sub |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1810 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1811 def _convertFrom(self, proposed): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1812 proposed = self.find_codec(proposed) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1813 if not proposed or proposed in self.triedEncodings: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1814 return None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1815 self.triedEncodings.append(proposed) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1816 markup = self.markup |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1817 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1818 # Convert smart quotes to HTML if coming from an encoding |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1819 # that might have them. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1820 if self.smartQuotesTo and proposed.lower() in("windows-1252", |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1821 "iso-8859-1", |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1822 "iso-8859-2"): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1823 markup = re.compile("([\x80-\x9f])").sub \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1824 (lambda(x): self._subMSChar(x.group(1)), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1825 markup) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1826 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1827 try: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1828 # print "Trying to convert document to %s" % proposed |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1829 u = self._toUnicode(markup, proposed) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1830 self.markup = u |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1831 self.originalEncoding = proposed |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1832 except Exception, e: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1833 # print "That didn't work!" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1834 # print e |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1835 return None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1836 #print "Correct encoding: %s" % proposed |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1837 return self.markup |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1838 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1839 def _toUnicode(self, data, encoding): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1840 '''Given a string and its encoding, decodes the string into Unicode. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1841 %encoding is a string recognized by encodings.aliases''' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1842 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1843 # strip Byte Order Mark (if present) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1844 if (len(data) >= 4) and (data[:2] == '\xfe\xff') \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1845 and (data[2:4] != '\x00\x00'): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1846 encoding = 'utf-16be' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1847 data = data[2:] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1848 elif (len(data) >= 4) and (data[:2] == '\xff\xfe') \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1849 and (data[2:4] != '\x00\x00'): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1850 encoding = 'utf-16le' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1851 data = data[2:] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1852 elif data[:3] == '\xef\xbb\xbf': |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1853 encoding = 'utf-8' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1854 data = data[3:] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1855 elif data[:4] == '\x00\x00\xfe\xff': |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1856 encoding = 'utf-32be' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1857 data = data[4:] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1858 elif data[:4] == '\xff\xfe\x00\x00': |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1859 encoding = 'utf-32le' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1860 data = data[4:] |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1861 newdata = unicode(data, encoding) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1862 return newdata |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1863 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1864 def _detectEncoding(self, xml_data, isHTML=False): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1865 """Given a document, tries to detect its XML encoding.""" |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1866 xml_encoding = sniffed_xml_encoding = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1867 try: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1868 if xml_data[:4] == '\x4c\x6f\xa7\x94': |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1869 # EBCDIC |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1870 xml_data = self._ebcdic_to_ascii(xml_data) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1871 elif xml_data[:4] == '\x00\x3c\x00\x3f': |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1872 # UTF-16BE |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1873 sniffed_xml_encoding = 'utf-16be' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1874 xml_data = unicode(xml_data, 'utf-16be').encode('utf-8') |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1875 elif (len(xml_data) >= 4) and (xml_data[:2] == '\xfe\xff') \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1876 and (xml_data[2:4] != '\x00\x00'): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1877 # UTF-16BE with BOM |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1878 sniffed_xml_encoding = 'utf-16be' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1879 xml_data = unicode(xml_data[2:], 'utf-16be').encode('utf-8') |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1880 elif xml_data[:4] == '\x3c\x00\x3f\x00': |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1881 # UTF-16LE |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1882 sniffed_xml_encoding = 'utf-16le' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1883 xml_data = unicode(xml_data, 'utf-16le').encode('utf-8') |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1884 elif (len(xml_data) >= 4) and (xml_data[:2] == '\xff\xfe') and \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1885 (xml_data[2:4] != '\x00\x00'): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1886 # UTF-16LE with BOM |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1887 sniffed_xml_encoding = 'utf-16le' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1888 xml_data = unicode(xml_data[2:], 'utf-16le').encode('utf-8') |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1889 elif xml_data[:4] == '\x00\x00\x00\x3c': |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1890 # UTF-32BE |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1891 sniffed_xml_encoding = 'utf-32be' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1892 xml_data = unicode(xml_data, 'utf-32be').encode('utf-8') |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1893 elif xml_data[:4] == '\x3c\x00\x00\x00': |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1894 # UTF-32LE |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1895 sniffed_xml_encoding = 'utf-32le' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1896 xml_data = unicode(xml_data, 'utf-32le').encode('utf-8') |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1897 elif xml_data[:4] == '\x00\x00\xfe\xff': |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1898 # UTF-32BE with BOM |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1899 sniffed_xml_encoding = 'utf-32be' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1900 xml_data = unicode(xml_data[4:], 'utf-32be').encode('utf-8') |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1901 elif xml_data[:4] == '\xff\xfe\x00\x00': |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1902 # UTF-32LE with BOM |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1903 sniffed_xml_encoding = 'utf-32le' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1904 xml_data = unicode(xml_data[4:], 'utf-32le').encode('utf-8') |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1905 elif xml_data[:3] == '\xef\xbb\xbf': |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1906 # UTF-8 with BOM |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1907 sniffed_xml_encoding = 'utf-8' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1908 xml_data = unicode(xml_data[3:], 'utf-8').encode('utf-8') |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1909 else: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1910 sniffed_xml_encoding = 'ascii' |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1911 pass |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1912 except: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1913 xml_encoding_match = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1914 xml_encoding_match = re.compile( |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1915 '^<\?.*encoding=[\'"](.*?)[\'"].*\?>').match(xml_data) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1916 if not xml_encoding_match and isHTML: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1917 regexp = re.compile('<\s*meta[^>]+charset=([^>]*?)[;\'">]', re.I) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1918 xml_encoding_match = regexp.search(xml_data) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1919 if xml_encoding_match is not None: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1920 xml_encoding = xml_encoding_match.groups()[0].lower() |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1921 if isHTML: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1922 self.declaredHTMLEncoding = xml_encoding |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1923 if sniffed_xml_encoding and \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1924 (xml_encoding in ('iso-10646-ucs-2', 'ucs-2', 'csunicode', |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1925 'iso-10646-ucs-4', 'ucs-4', 'csucs4', |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1926 'utf-16', 'utf-32', 'utf_16', 'utf_32', |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1927 'utf16', 'u16')): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1928 xml_encoding = sniffed_xml_encoding |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1929 return xml_data, xml_encoding, sniffed_xml_encoding |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1930 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1931 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1932 def find_codec(self, charset): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1933 return self._codec(self.CHARSET_ALIASES.get(charset, charset)) \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1934 or (charset and self._codec(charset.replace("-", ""))) \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1935 or (charset and self._codec(charset.replace("-", "_"))) \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1936 or charset |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1937 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1938 def _codec(self, charset): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1939 if not charset: return charset |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1940 codec = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1941 try: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1942 codecs.lookup(charset) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1943 codec = charset |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1944 except (LookupError, ValueError): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1945 pass |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1946 return codec |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1947 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1948 EBCDIC_TO_ASCII_MAP = None |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1949 def _ebcdic_to_ascii(self, s): |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1950 c = self.__class__ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1951 if not c.EBCDIC_TO_ASCII_MAP: |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1952 emap = (0,1,2,3,156,9,134,127,151,141,142,11,12,13,14,15, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1953 16,17,18,19,157,133,8,135,24,25,146,143,28,29,30,31, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1954 128,129,130,131,132,10,23,27,136,137,138,139,140,5,6,7, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1955 144,145,22,147,148,149,150,4,152,153,154,155,20,21,158,26, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1956 32,160,161,162,163,164,165,166,167,168,91,46,60,40,43,33, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1957 38,169,170,171,172,173,174,175,176,177,93,36,42,41,59,94, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1958 45,47,178,179,180,181,182,183,184,185,124,44,37,95,62,63, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1959 186,187,188,189,190,191,192,193,194,96,58,35,64,39,61,34, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1960 195,97,98,99,100,101,102,103,104,105,196,197,198,199,200, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1961 201,202,106,107,108,109,110,111,112,113,114,203,204,205, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1962 206,207,208,209,126,115,116,117,118,119,120,121,122,210, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1963 211,212,213,214,215,216,217,218,219,220,221,222,223,224, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1964 225,226,227,228,229,230,231,123,65,66,67,68,69,70,71,72, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1965 73,232,233,234,235,236,237,125,74,75,76,77,78,79,80,81, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1966 82,238,239,240,241,242,243,92,159,83,84,85,86,87,88,89, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1967 90,244,245,246,247,248,249,48,49,50,51,52,53,54,55,56,57, |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1968 250,251,252,253,254,255) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1969 import string |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1970 c.EBCDIC_TO_ASCII_MAP = string.maketrans( \ |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1971 ''.join(map(chr, range(256))), ''.join(map(chr, emap))) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1972 return s.translate(c.EBCDIC_TO_ASCII_MAP) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1973 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1974 MS_CHARS = { '\x80' : ('euro', '20AC'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1975 '\x81' : ' ', |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1976 '\x82' : ('sbquo', '201A'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1977 '\x83' : ('fnof', '192'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1978 '\x84' : ('bdquo', '201E'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1979 '\x85' : ('hellip', '2026'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1980 '\x86' : ('dagger', '2020'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1981 '\x87' : ('Dagger', '2021'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1982 '\x88' : ('circ', '2C6'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1983 '\x89' : ('permil', '2030'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1984 '\x8A' : ('Scaron', '160'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1985 '\x8B' : ('lsaquo', '2039'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1986 '\x8C' : ('OElig', '152'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1987 '\x8D' : '?', |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1988 '\x8E' : ('#x17D', '17D'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1989 '\x8F' : '?', |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1990 '\x90' : '?', |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1991 '\x91' : ('lsquo', '2018'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1992 '\x92' : ('rsquo', '2019'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1993 '\x93' : ('ldquo', '201C'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1994 '\x94' : ('rdquo', '201D'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1995 '\x95' : ('bull', '2022'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1996 '\x96' : ('ndash', '2013'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1997 '\x97' : ('mdash', '2014'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1998 '\x98' : ('tilde', '2DC'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
1999 '\x99' : ('trade', '2122'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
2000 '\x9a' : ('scaron', '161'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
2001 '\x9b' : ('rsaquo', '203A'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
2002 '\x9c' : ('oelig', '153'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
2003 '\x9d' : '?', |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
2004 '\x9e' : ('#x17E', '17E'), |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
2005 '\x9f' : ('Yuml', ''),} |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
2006 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
2007 ####################################################################### |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
2008 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
2009 |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
2010 #By default, act as an HTML pretty-printer. |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
2011 if __name__ == '__main__': |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
2012 import sys |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
2013 soup = BeautifulSoup(sys.stdin) |
f02e37f395ae
Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff
changeset
|
2014 print soup.prettify() |