annotate resources/BeautifulSoup.py @ 126:47209552ec46

Shellescaped all command arguments in HgRepo.GetCommand, so that the commands work properly with ugly file names, in my case containing parentheses. Wrapping revision arguments in quotes is no longer necessary, so removed all of that as well.
author namark <nshan.nnnn@gmail.com>
date Wed, 02 Dec 2015 22:45:12 +0400
parents f02e37f395ae
children
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
15
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1 """Beautiful Soup
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
2 Elixir and Tonic
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
3 "The Screen-Scraper's Friend"
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
4 http://www.crummy.com/software/BeautifulSoup/
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
5
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
6 Beautiful Soup parses a (possibly invalid) XML or HTML document into a
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
7 tree representation. It provides methods and Pythonic idioms that make
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
8 it easy to navigate, search, and modify the tree.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
9
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
10 A well-formed XML/HTML document yields a well-formed data
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
11 structure. An ill-formed XML/HTML document yields a correspondingly
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
12 ill-formed data structure. If your document is only locally
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
13 well-formed, you can use this library to find and process the
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
14 well-formed part of it.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
15
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
16 Beautiful Soup works with Python 2.2 and up. It has no external
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
17 dependencies, but you'll have more success at converting data to UTF-8
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
18 if you also install these three packages:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
19
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
20 * chardet, for auto-detecting character encodings
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
21 http://chardet.feedparser.org/
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
22 * cjkcodecs and iconv_codec, which add more encodings to the ones supported
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
23 by stock Python.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
24 http://cjkpython.i18n.org/
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
25
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
26 Beautiful Soup defines classes for two main parsing strategies:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
27
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
28 * BeautifulStoneSoup, for parsing XML, SGML, or your domain-specific
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
29 language that kind of looks like XML.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
30
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
31 * BeautifulSoup, for parsing run-of-the-mill HTML code, be it valid
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
32 or invalid. This class has web browser-like heuristics for
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
33 obtaining a sensible parse tree in the face of common HTML errors.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
34
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
35 Beautiful Soup also defines a class (UnicodeDammit) for autodetecting
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
36 the encoding of an HTML or XML document, and converting it to
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
37 Unicode. Much of this code is taken from Mark Pilgrim's Universal Feed Parser.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
38
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
39 For more than you ever wanted to know about Beautiful Soup, see the
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
40 documentation:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
41 http://www.crummy.com/software/BeautifulSoup/documentation.html
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
42
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
43 Here, have some legalese:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
44
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
45 Copyright (c) 2004-2010, Leonard Richardson
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
46
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
47 All rights reserved.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
48
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
49 Redistribution and use in source and binary forms, with or without
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
50 modification, are permitted provided that the following conditions are
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
51 met:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
52
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
53 * Redistributions of source code must retain the above copyright
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
54 notice, this list of conditions and the following disclaimer.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
55
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
56 * Redistributions in binary form must reproduce the above
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
57 copyright notice, this list of conditions and the following
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
58 disclaimer in the documentation and/or other materials provided
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
59 with the distribution.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
60
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
61 * Neither the name of the the Beautiful Soup Consortium and All
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
62 Night Kosher Bakery nor the names of its contributors may be
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
63 used to endorse or promote products derived from this software
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
64 without specific prior written permission.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
65
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
66 THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
67 "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
68 LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
69 A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT OWNER OR
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
70 CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
71 EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
72 PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
73 PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
74 LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
75 NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
76 SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE, DAMMIT.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
77
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
78 """
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
79 from __future__ import generators
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
80
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
81 __author__ = "Leonard Richardson (leonardr@segfault.org)"
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
82 __version__ = "3.2.0"
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
83 __copyright__ = "Copyright (c) 2004-2010 Leonard Richardson"
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
84 __license__ = "New-style BSD"
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
85
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
86 from sgmllib import SGMLParser, SGMLParseError
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
87 import codecs
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
88 import markupbase
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
89 import types
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
90 import re
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
91 import sgmllib
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
92 try:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
93 from htmlentitydefs import name2codepoint
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
94 except ImportError:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
95 name2codepoint = {}
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
96 try:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
97 set
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
98 except NameError:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
99 from sets import Set as set
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
100
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
101 #These hacks make Beautiful Soup able to parse XML with namespaces
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
102 sgmllib.tagfind = re.compile('[a-zA-Z][-_.:a-zA-Z0-9]*')
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
103 markupbase._declname_match = re.compile(r'[a-zA-Z][-_.:a-zA-Z0-9]*\s*').match
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
104
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
105 DEFAULT_OUTPUT_ENCODING = "utf-8"
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
106
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
107 def _match_css_class(str):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
108 """Build a RE to match the given CSS class."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
109 return re.compile(r"(^|.*\s)%s($|\s)" % str)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
110
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
111 # First, the classes that represent markup elements.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
112
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
113 class PageElement(object):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
114 """Contains the navigational information for some part of the page
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
115 (either a tag or a piece of text)"""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
116
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
117 def setup(self, parent=None, previous=None):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
118 """Sets up the initial relations between this element and
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
119 other elements."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
120 self.parent = parent
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
121 self.previous = previous
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
122 self.next = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
123 self.previousSibling = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
124 self.nextSibling = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
125 if self.parent and self.parent.contents:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
126 self.previousSibling = self.parent.contents[-1]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
127 self.previousSibling.nextSibling = self
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
128
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
129 def replaceWith(self, replaceWith):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
130 oldParent = self.parent
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
131 myIndex = self.parent.index(self)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
132 if hasattr(replaceWith, "parent")\
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
133 and replaceWith.parent is self.parent:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
134 # We're replacing this element with one of its siblings.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
135 index = replaceWith.parent.index(replaceWith)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
136 if index and index < myIndex:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
137 # Furthermore, it comes before this element. That
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
138 # means that when we extract it, the index of this
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
139 # element will change.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
140 myIndex = myIndex - 1
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
141 self.extract()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
142 oldParent.insert(myIndex, replaceWith)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
143
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
144 def replaceWithChildren(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
145 myParent = self.parent
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
146 myIndex = self.parent.index(self)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
147 self.extract()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
148 reversedChildren = list(self.contents)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
149 reversedChildren.reverse()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
150 for child in reversedChildren:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
151 myParent.insert(myIndex, child)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
152
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
153 def extract(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
154 """Destructively rips this element out of the tree."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
155 if self.parent:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
156 try:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
157 del self.parent.contents[self.parent.index(self)]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
158 except ValueError:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
159 pass
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
160
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
161 #Find the two elements that would be next to each other if
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
162 #this element (and any children) hadn't been parsed. Connect
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
163 #the two.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
164 lastChild = self._lastRecursiveChild()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
165 nextElement = lastChild.next
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
166
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
167 if self.previous:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
168 self.previous.next = nextElement
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
169 if nextElement:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
170 nextElement.previous = self.previous
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
171 self.previous = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
172 lastChild.next = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
173
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
174 self.parent = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
175 if self.previousSibling:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
176 self.previousSibling.nextSibling = self.nextSibling
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
177 if self.nextSibling:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
178 self.nextSibling.previousSibling = self.previousSibling
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
179 self.previousSibling = self.nextSibling = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
180 return self
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
181
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
182 def _lastRecursiveChild(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
183 "Finds the last element beneath this object to be parsed."
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
184 lastChild = self
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
185 while hasattr(lastChild, 'contents') and lastChild.contents:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
186 lastChild = lastChild.contents[-1]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
187 return lastChild
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
188
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
189 def insert(self, position, newChild):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
190 if isinstance(newChild, basestring) \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
191 and not isinstance(newChild, NavigableString):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
192 newChild = NavigableString(newChild)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
193
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
194 position = min(position, len(self.contents))
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
195 if hasattr(newChild, 'parent') and newChild.parent is not None:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
196 # We're 'inserting' an element that's already one
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
197 # of this object's children.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
198 if newChild.parent is self:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
199 index = self.index(newChild)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
200 if index > position:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
201 # Furthermore we're moving it further down the
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
202 # list of this object's children. That means that
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
203 # when we extract this element, our target index
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
204 # will jump down one.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
205 position = position - 1
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
206 newChild.extract()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
207
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
208 newChild.parent = self
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
209 previousChild = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
210 if position == 0:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
211 newChild.previousSibling = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
212 newChild.previous = self
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
213 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
214 previousChild = self.contents[position-1]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
215 newChild.previousSibling = previousChild
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
216 newChild.previousSibling.nextSibling = newChild
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
217 newChild.previous = previousChild._lastRecursiveChild()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
218 if newChild.previous:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
219 newChild.previous.next = newChild
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
220
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
221 newChildsLastElement = newChild._lastRecursiveChild()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
222
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
223 if position >= len(self.contents):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
224 newChild.nextSibling = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
225
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
226 parent = self
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
227 parentsNextSibling = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
228 while not parentsNextSibling:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
229 parentsNextSibling = parent.nextSibling
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
230 parent = parent.parent
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
231 if not parent: # This is the last element in the document.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
232 break
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
233 if parentsNextSibling:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
234 newChildsLastElement.next = parentsNextSibling
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
235 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
236 newChildsLastElement.next = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
237 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
238 nextChild = self.contents[position]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
239 newChild.nextSibling = nextChild
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
240 if newChild.nextSibling:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
241 newChild.nextSibling.previousSibling = newChild
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
242 newChildsLastElement.next = nextChild
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
243
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
244 if newChildsLastElement.next:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
245 newChildsLastElement.next.previous = newChildsLastElement
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
246 self.contents.insert(position, newChild)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
247
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
248 def append(self, tag):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
249 """Appends the given tag to the contents of this tag."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
250 self.insert(len(self.contents), tag)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
251
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
252 def findNext(self, name=None, attrs={}, text=None, **kwargs):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
253 """Returns the first item that matches the given criteria and
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
254 appears after this Tag in the document."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
255 return self._findOne(self.findAllNext, name, attrs, text, **kwargs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
256
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
257 def findAllNext(self, name=None, attrs={}, text=None, limit=None,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
258 **kwargs):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
259 """Returns all items that match the given criteria and appear
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
260 after this Tag in the document."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
261 return self._findAll(name, attrs, text, limit, self.nextGenerator,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
262 **kwargs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
263
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
264 def findNextSibling(self, name=None, attrs={}, text=None, **kwargs):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
265 """Returns the closest sibling to this Tag that matches the
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
266 given criteria and appears after this Tag in the document."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
267 return self._findOne(self.findNextSiblings, name, attrs, text,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
268 **kwargs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
269
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
270 def findNextSiblings(self, name=None, attrs={}, text=None, limit=None,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
271 **kwargs):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
272 """Returns the siblings of this Tag that match the given
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
273 criteria and appear after this Tag in the document."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
274 return self._findAll(name, attrs, text, limit,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
275 self.nextSiblingGenerator, **kwargs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
276 fetchNextSiblings = findNextSiblings # Compatibility with pre-3.x
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
277
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
278 def findPrevious(self, name=None, attrs={}, text=None, **kwargs):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
279 """Returns the first item that matches the given criteria and
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
280 appears before this Tag in the document."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
281 return self._findOne(self.findAllPrevious, name, attrs, text, **kwargs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
282
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
283 def findAllPrevious(self, name=None, attrs={}, text=None, limit=None,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
284 **kwargs):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
285 """Returns all items that match the given criteria and appear
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
286 before this Tag in the document."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
287 return self._findAll(name, attrs, text, limit, self.previousGenerator,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
288 **kwargs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
289 fetchPrevious = findAllPrevious # Compatibility with pre-3.x
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
290
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
291 def findPreviousSibling(self, name=None, attrs={}, text=None, **kwargs):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
292 """Returns the closest sibling to this Tag that matches the
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
293 given criteria and appears before this Tag in the document."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
294 return self._findOne(self.findPreviousSiblings, name, attrs, text,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
295 **kwargs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
296
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
297 def findPreviousSiblings(self, name=None, attrs={}, text=None,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
298 limit=None, **kwargs):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
299 """Returns the siblings of this Tag that match the given
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
300 criteria and appear before this Tag in the document."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
301 return self._findAll(name, attrs, text, limit,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
302 self.previousSiblingGenerator, **kwargs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
303 fetchPreviousSiblings = findPreviousSiblings # Compatibility with pre-3.x
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
304
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
305 def findParent(self, name=None, attrs={}, **kwargs):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
306 """Returns the closest parent of this Tag that matches the given
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
307 criteria."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
308 # NOTE: We can't use _findOne because findParents takes a different
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
309 # set of arguments.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
310 r = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
311 l = self.findParents(name, attrs, 1)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
312 if l:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
313 r = l[0]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
314 return r
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
315
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
316 def findParents(self, name=None, attrs={}, limit=None, **kwargs):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
317 """Returns the parents of this Tag that match the given
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
318 criteria."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
319
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
320 return self._findAll(name, attrs, None, limit, self.parentGenerator,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
321 **kwargs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
322 fetchParents = findParents # Compatibility with pre-3.x
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
323
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
324 #These methods do the real heavy lifting.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
325
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
326 def _findOne(self, method, name, attrs, text, **kwargs):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
327 r = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
328 l = method(name, attrs, text, 1, **kwargs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
329 if l:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
330 r = l[0]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
331 return r
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
332
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
333 def _findAll(self, name, attrs, text, limit, generator, **kwargs):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
334 "Iterates over a generator looking for things that match."
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
335
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
336 if isinstance(name, SoupStrainer):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
337 strainer = name
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
338 # (Possibly) special case some findAll*(...) searches
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
339 elif text is None and not limit and not attrs and not kwargs:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
340 # findAll*(True)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
341 if name is True:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
342 return [element for element in generator()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
343 if isinstance(element, Tag)]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
344 # findAll*('tag-name')
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
345 elif isinstance(name, basestring):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
346 return [element for element in generator()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
347 if isinstance(element, Tag) and
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
348 element.name == name]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
349 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
350 strainer = SoupStrainer(name, attrs, text, **kwargs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
351 # Build a SoupStrainer
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
352 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
353 strainer = SoupStrainer(name, attrs, text, **kwargs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
354 results = ResultSet(strainer)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
355 g = generator()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
356 while True:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
357 try:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
358 i = g.next()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
359 except StopIteration:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
360 break
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
361 if i:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
362 found = strainer.search(i)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
363 if found:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
364 results.append(found)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
365 if limit and len(results) >= limit:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
366 break
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
367 return results
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
368
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
369 #These Generators can be used to navigate starting from both
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
370 #NavigableStrings and Tags.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
371 def nextGenerator(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
372 i = self
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
373 while i is not None:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
374 i = i.next
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
375 yield i
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
376
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
377 def nextSiblingGenerator(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
378 i = self
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
379 while i is not None:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
380 i = i.nextSibling
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
381 yield i
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
382
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
383 def previousGenerator(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
384 i = self
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
385 while i is not None:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
386 i = i.previous
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
387 yield i
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
388
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
389 def previousSiblingGenerator(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
390 i = self
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
391 while i is not None:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
392 i = i.previousSibling
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
393 yield i
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
394
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
395 def parentGenerator(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
396 i = self
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
397 while i is not None:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
398 i = i.parent
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
399 yield i
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
400
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
401 # Utility methods
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
402 def substituteEncoding(self, str, encoding=None):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
403 encoding = encoding or "utf-8"
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
404 return str.replace("%SOUP-ENCODING%", encoding)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
405
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
406 def toEncoding(self, s, encoding=None):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
407 """Encodes an object to a string in some encoding, or to Unicode.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
408 ."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
409 if isinstance(s, unicode):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
410 if encoding:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
411 s = s.encode(encoding)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
412 elif isinstance(s, str):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
413 if encoding:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
414 s = s.encode(encoding)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
415 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
416 s = unicode(s)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
417 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
418 if encoding:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
419 s = self.toEncoding(str(s), encoding)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
420 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
421 s = unicode(s)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
422 return s
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
423
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
424 class NavigableString(unicode, PageElement):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
425
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
426 def __new__(cls, value):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
427 """Create a new NavigableString.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
428
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
429 When unpickling a NavigableString, this method is called with
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
430 the string in DEFAULT_OUTPUT_ENCODING. That encoding needs to be
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
431 passed in to the superclass's __new__ or the superclass won't know
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
432 how to handle non-ASCII characters.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
433 """
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
434 if isinstance(value, unicode):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
435 return unicode.__new__(cls, value)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
436 return unicode.__new__(cls, value, DEFAULT_OUTPUT_ENCODING)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
437
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
438 def __getnewargs__(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
439 return (NavigableString.__str__(self),)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
440
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
441 def __getattr__(self, attr):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
442 """text.string gives you text. This is for backwards
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
443 compatibility for Navigable*String, but for CData* it lets you
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
444 get the string without the CData wrapper."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
445 if attr == 'string':
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
446 return self
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
447 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
448 raise AttributeError, "'%s' object has no attribute '%s'" % (self.__class__.__name__, attr)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
449
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
450 def __unicode__(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
451 return str(self).decode(DEFAULT_OUTPUT_ENCODING)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
452
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
453 def __str__(self, encoding=DEFAULT_OUTPUT_ENCODING):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
454 if encoding:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
455 return self.encode(encoding)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
456 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
457 return self
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
458
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
459 class CData(NavigableString):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
460
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
461 def __str__(self, encoding=DEFAULT_OUTPUT_ENCODING):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
462 return "<![CDATA[%s]]>" % NavigableString.__str__(self, encoding)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
463
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
464 class ProcessingInstruction(NavigableString):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
465 def __str__(self, encoding=DEFAULT_OUTPUT_ENCODING):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
466 output = self
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
467 if "%SOUP-ENCODING%" in output:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
468 output = self.substituteEncoding(output, encoding)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
469 return "<?%s?>" % self.toEncoding(output, encoding)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
470
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
471 class Comment(NavigableString):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
472 def __str__(self, encoding=DEFAULT_OUTPUT_ENCODING):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
473 return "<!--%s-->" % NavigableString.__str__(self, encoding)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
474
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
475 class Declaration(NavigableString):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
476 def __str__(self, encoding=DEFAULT_OUTPUT_ENCODING):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
477 return "<!%s>" % NavigableString.__str__(self, encoding)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
478
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
479 class Tag(PageElement):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
480
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
481 """Represents a found HTML tag with its attributes and contents."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
482
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
483 def _invert(h):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
484 "Cheap function to invert a hash."
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
485 i = {}
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
486 for k,v in h.items():
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
487 i[v] = k
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
488 return i
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
489
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
490 XML_ENTITIES_TO_SPECIAL_CHARS = { "apos" : "'",
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
491 "quot" : '"',
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
492 "amp" : "&",
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
493 "lt" : "<",
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
494 "gt" : ">" }
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
495
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
496 XML_SPECIAL_CHARS_TO_ENTITIES = _invert(XML_ENTITIES_TO_SPECIAL_CHARS)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
497
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
498 def _convertEntities(self, match):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
499 """Used in a call to re.sub to replace HTML, XML, and numeric
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
500 entities with the appropriate Unicode characters. If HTML
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
501 entities are being converted, any unrecognized entities are
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
502 escaped."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
503 x = match.group(1)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
504 if self.convertHTMLEntities and x in name2codepoint:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
505 return unichr(name2codepoint[x])
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
506 elif x in self.XML_ENTITIES_TO_SPECIAL_CHARS:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
507 if self.convertXMLEntities:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
508 return self.XML_ENTITIES_TO_SPECIAL_CHARS[x]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
509 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
510 return u'&%s;' % x
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
511 elif len(x) > 0 and x[0] == '#':
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
512 # Handle numeric entities
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
513 if len(x) > 1 and x[1] == 'x':
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
514 return unichr(int(x[2:], 16))
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
515 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
516 return unichr(int(x[1:]))
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
517
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
518 elif self.escapeUnrecognizedEntities:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
519 return u'&amp;%s;' % x
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
520 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
521 return u'&%s;' % x
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
522
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
523 def __init__(self, parser, name, attrs=None, parent=None,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
524 previous=None):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
525 "Basic constructor."
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
526
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
527 # We don't actually store the parser object: that lets extracted
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
528 # chunks be garbage-collected
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
529 self.parserClass = parser.__class__
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
530 self.isSelfClosing = parser.isSelfClosingTag(name)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
531 self.name = name
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
532 if attrs is None:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
533 attrs = []
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
534 elif isinstance(attrs, dict):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
535 attrs = attrs.items()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
536 self.attrs = attrs
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
537 self.contents = []
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
538 self.setup(parent, previous)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
539 self.hidden = False
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
540 self.containsSubstitutions = False
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
541 self.convertHTMLEntities = parser.convertHTMLEntities
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
542 self.convertXMLEntities = parser.convertXMLEntities
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
543 self.escapeUnrecognizedEntities = parser.escapeUnrecognizedEntities
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
544
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
545 # Convert any HTML, XML, or numeric entities in the attribute values.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
546 convert = lambda(k, val): (k,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
547 re.sub("&(#\d+|#x[0-9a-fA-F]+|\w+);",
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
548 self._convertEntities,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
549 val))
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
550 self.attrs = map(convert, self.attrs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
551
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
552 def getString(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
553 if (len(self.contents) == 1
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
554 and isinstance(self.contents[0], NavigableString)):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
555 return self.contents[0]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
556
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
557 def setString(self, string):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
558 """Replace the contents of the tag with a string"""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
559 self.clear()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
560 self.append(string)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
561
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
562 string = property(getString, setString)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
563
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
564 def getText(self, separator=u""):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
565 if not len(self.contents):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
566 return u""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
567 stopNode = self._lastRecursiveChild().next
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
568 strings = []
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
569 current = self.contents[0]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
570 while current is not stopNode:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
571 if isinstance(current, NavigableString):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
572 strings.append(current.strip())
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
573 current = current.next
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
574 return separator.join(strings)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
575
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
576 text = property(getText)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
577
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
578 def get(self, key, default=None):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
579 """Returns the value of the 'key' attribute for the tag, or
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
580 the value given for 'default' if it doesn't have that
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
581 attribute."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
582 return self._getAttrMap().get(key, default)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
583
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
584 def clear(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
585 """Extract all children."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
586 for child in self.contents[:]:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
587 child.extract()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
588
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
589 def index(self, element):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
590 for i, child in enumerate(self.contents):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
591 if child is element:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
592 return i
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
593 raise ValueError("Tag.index: element not in tag")
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
594
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
595 def has_key(self, key):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
596 return self._getAttrMap().has_key(key)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
597
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
598 def __getitem__(self, key):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
599 """tag[key] returns the value of the 'key' attribute for the tag,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
600 and throws an exception if it's not there."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
601 return self._getAttrMap()[key]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
602
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
603 def __iter__(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
604 "Iterating over a tag iterates over its contents."
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
605 return iter(self.contents)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
606
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
607 def __len__(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
608 "The length of a tag is the length of its list of contents."
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
609 return len(self.contents)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
610
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
611 def __contains__(self, x):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
612 return x in self.contents
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
613
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
614 def __nonzero__(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
615 "A tag is non-None even if it has no contents."
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
616 return True
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
617
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
618 def __setitem__(self, key, value):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
619 """Setting tag[key] sets the value of the 'key' attribute for the
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
620 tag."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
621 self._getAttrMap()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
622 self.attrMap[key] = value
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
623 found = False
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
624 for i in range(0, len(self.attrs)):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
625 if self.attrs[i][0] == key:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
626 self.attrs[i] = (key, value)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
627 found = True
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
628 if not found:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
629 self.attrs.append((key, value))
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
630 self._getAttrMap()[key] = value
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
631
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
632 def __delitem__(self, key):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
633 "Deleting tag[key] deletes all 'key' attributes for the tag."
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
634 for item in self.attrs:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
635 if item[0] == key:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
636 self.attrs.remove(item)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
637 #We don't break because bad HTML can define the same
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
638 #attribute multiple times.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
639 self._getAttrMap()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
640 if self.attrMap.has_key(key):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
641 del self.attrMap[key]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
642
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
643 def __call__(self, *args, **kwargs):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
644 """Calling a tag like a function is the same as calling its
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
645 findAll() method. Eg. tag('a') returns a list of all the A tags
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
646 found within this tag."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
647 return apply(self.findAll, args, kwargs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
648
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
649 def __getattr__(self, tag):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
650 #print "Getattr %s.%s" % (self.__class__, tag)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
651 if len(tag) > 3 and tag.rfind('Tag') == len(tag)-3:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
652 return self.find(tag[:-3])
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
653 elif tag.find('__') != 0:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
654 return self.find(tag)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
655 raise AttributeError, "'%s' object has no attribute '%s'" % (self.__class__, tag)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
656
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
657 def __eq__(self, other):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
658 """Returns true iff this tag has the same name, the same attributes,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
659 and the same contents (recursively) as the given tag.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
660
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
661 NOTE: right now this will return false if two tags have the
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
662 same attributes in a different order. Should this be fixed?"""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
663 if other is self:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
664 return True
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
665 if not hasattr(other, 'name') or not hasattr(other, 'attrs') or not hasattr(other, 'contents') or self.name != other.name or self.attrs != other.attrs or len(self) != len(other):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
666 return False
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
667 for i in range(0, len(self.contents)):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
668 if self.contents[i] != other.contents[i]:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
669 return False
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
670 return True
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
671
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
672 def __ne__(self, other):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
673 """Returns true iff this tag is not identical to the other tag,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
674 as defined in __eq__."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
675 return not self == other
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
676
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
677 def __repr__(self, encoding=DEFAULT_OUTPUT_ENCODING):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
678 """Renders this tag as a string."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
679 return self.__str__(encoding)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
680
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
681 def __unicode__(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
682 return self.__str__(None)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
683
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
684 BARE_AMPERSAND_OR_BRACKET = re.compile("([<>]|"
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
685 + "&(?!#\d+;|#x[0-9a-fA-F]+;|\w+;)"
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
686 + ")")
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
687
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
688 def _sub_entity(self, x):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
689 """Used with a regular expression to substitute the
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
690 appropriate XML entity for an XML special character."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
691 return "&" + self.XML_SPECIAL_CHARS_TO_ENTITIES[x.group(0)[0]] + ";"
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
692
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
693 def __str__(self, encoding=DEFAULT_OUTPUT_ENCODING,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
694 prettyPrint=False, indentLevel=0):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
695 """Returns a string or Unicode representation of this tag and
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
696 its contents. To get Unicode, pass None for encoding.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
697
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
698 NOTE: since Python's HTML parser consumes whitespace, this
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
699 method is not certain to reproduce the whitespace present in
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
700 the original string."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
701
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
702 encodedName = self.toEncoding(self.name, encoding)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
703
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
704 attrs = []
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
705 if self.attrs:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
706 for key, val in self.attrs:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
707 fmt = '%s="%s"'
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
708 if isinstance(val, basestring):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
709 if self.containsSubstitutions and '%SOUP-ENCODING%' in val:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
710 val = self.substituteEncoding(val, encoding)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
711
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
712 # The attribute value either:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
713 #
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
714 # * Contains no embedded double quotes or single quotes.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
715 # No problem: we enclose it in double quotes.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
716 # * Contains embedded single quotes. No problem:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
717 # double quotes work here too.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
718 # * Contains embedded double quotes. No problem:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
719 # we enclose it in single quotes.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
720 # * Embeds both single _and_ double quotes. This
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
721 # can't happen naturally, but it can happen if
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
722 # you modify an attribute value after parsing
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
723 # the document. Now we have a bit of a
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
724 # problem. We solve it by enclosing the
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
725 # attribute in single quotes, and escaping any
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
726 # embedded single quotes to XML entities.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
727 if '"' in val:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
728 fmt = "%s='%s'"
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
729 if "'" in val:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
730 # TODO: replace with apos when
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
731 # appropriate.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
732 val = val.replace("'", "&squot;")
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
733
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
734 # Now we're okay w/r/t quotes. But the attribute
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
735 # value might also contain angle brackets, or
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
736 # ampersands that aren't part of entities. We need
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
737 # to escape those to XML entities too.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
738 val = self.BARE_AMPERSAND_OR_BRACKET.sub(self._sub_entity, val)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
739
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
740 attrs.append(fmt % (self.toEncoding(key, encoding),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
741 self.toEncoding(val, encoding)))
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
742 close = ''
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
743 closeTag = ''
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
744 if self.isSelfClosing:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
745 close = ' /'
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
746 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
747 closeTag = '</%s>' % encodedName
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
748
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
749 indentTag, indentContents = 0, 0
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
750 if prettyPrint:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
751 indentTag = indentLevel
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
752 space = (' ' * (indentTag-1))
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
753 indentContents = indentTag + 1
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
754 contents = self.renderContents(encoding, prettyPrint, indentContents)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
755 if self.hidden:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
756 s = contents
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
757 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
758 s = []
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
759 attributeString = ''
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
760 if attrs:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
761 attributeString = ' ' + ' '.join(attrs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
762 if prettyPrint:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
763 s.append(space)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
764 s.append('<%s%s%s>' % (encodedName, attributeString, close))
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
765 if prettyPrint:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
766 s.append("\n")
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
767 s.append(contents)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
768 if prettyPrint and contents and contents[-1] != "\n":
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
769 s.append("\n")
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
770 if prettyPrint and closeTag:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
771 s.append(space)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
772 s.append(closeTag)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
773 if prettyPrint and closeTag and self.nextSibling:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
774 s.append("\n")
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
775 s = ''.join(s)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
776 return s
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
777
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
778 def decompose(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
779 """Recursively destroys the contents of this tree."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
780 self.extract()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
781 if len(self.contents) == 0:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
782 return
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
783 current = self.contents[0]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
784 while current is not None:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
785 next = current.next
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
786 if isinstance(current, Tag):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
787 del current.contents[:]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
788 current.parent = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
789 current.previous = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
790 current.previousSibling = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
791 current.next = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
792 current.nextSibling = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
793 current = next
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
794
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
795 def prettify(self, encoding=DEFAULT_OUTPUT_ENCODING):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
796 return self.__str__(encoding, True)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
797
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
798 def renderContents(self, encoding=DEFAULT_OUTPUT_ENCODING,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
799 prettyPrint=False, indentLevel=0):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
800 """Renders the contents of this tag as a string in the given
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
801 encoding. If encoding is None, returns a Unicode string.."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
802 s=[]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
803 for c in self:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
804 text = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
805 if isinstance(c, NavigableString):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
806 text = c.__str__(encoding)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
807 elif isinstance(c, Tag):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
808 s.append(c.__str__(encoding, prettyPrint, indentLevel))
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
809 if text and prettyPrint:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
810 text = text.strip()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
811 if text:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
812 if prettyPrint:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
813 s.append(" " * (indentLevel-1))
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
814 s.append(text)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
815 if prettyPrint:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
816 s.append("\n")
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
817 return ''.join(s)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
818
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
819 #Soup methods
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
820
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
821 def find(self, name=None, attrs={}, recursive=True, text=None,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
822 **kwargs):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
823 """Return only the first child of this Tag matching the given
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
824 criteria."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
825 r = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
826 l = self.findAll(name, attrs, recursive, text, 1, **kwargs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
827 if l:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
828 r = l[0]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
829 return r
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
830 findChild = find
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
831
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
832 def findAll(self, name=None, attrs={}, recursive=True, text=None,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
833 limit=None, **kwargs):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
834 """Extracts a list of Tag objects that match the given
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
835 criteria. You can specify the name of the Tag and any
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
836 attributes you want the Tag to have.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
837
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
838 The value of a key-value pair in the 'attrs' map can be a
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
839 string, a list of strings, a regular expression object, or a
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
840 callable that takes a string and returns whether or not the
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
841 string matches for some custom definition of 'matches'. The
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
842 same is true of the tag name."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
843 generator = self.recursiveChildGenerator
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
844 if not recursive:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
845 generator = self.childGenerator
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
846 return self._findAll(name, attrs, text, limit, generator, **kwargs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
847 findChildren = findAll
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
848
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
849 # Pre-3.x compatibility methods
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
850 first = find
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
851 fetch = findAll
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
852
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
853 def fetchText(self, text=None, recursive=True, limit=None):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
854 return self.findAll(text=text, recursive=recursive, limit=limit)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
855
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
856 def firstText(self, text=None, recursive=True):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
857 return self.find(text=text, recursive=recursive)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
858
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
859 #Private methods
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
860
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
861 def _getAttrMap(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
862 """Initializes a map representation of this tag's attributes,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
863 if not already initialized."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
864 if not getattr(self, 'attrMap'):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
865 self.attrMap = {}
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
866 for (key, value) in self.attrs:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
867 self.attrMap[key] = value
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
868 return self.attrMap
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
869
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
870 #Generator methods
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
871 def childGenerator(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
872 # Just use the iterator from the contents
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
873 return iter(self.contents)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
874
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
875 def recursiveChildGenerator(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
876 if not len(self.contents):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
877 raise StopIteration
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
878 stopNode = self._lastRecursiveChild().next
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
879 current = self.contents[0]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
880 while current is not stopNode:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
881 yield current
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
882 current = current.next
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
883
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
884
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
885 # Next, a couple classes to represent queries and their results.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
886 class SoupStrainer:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
887 """Encapsulates a number of ways of matching a markup element (tag or
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
888 text)."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
889
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
890 def __init__(self, name=None, attrs={}, text=None, **kwargs):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
891 self.name = name
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
892 if isinstance(attrs, basestring):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
893 kwargs['class'] = _match_css_class(attrs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
894 attrs = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
895 if kwargs:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
896 if attrs:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
897 attrs = attrs.copy()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
898 attrs.update(kwargs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
899 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
900 attrs = kwargs
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
901 self.attrs = attrs
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
902 self.text = text
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
903
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
904 def __str__(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
905 if self.text:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
906 return self.text
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
907 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
908 return "%s|%s" % (self.name, self.attrs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
909
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
910 def searchTag(self, markupName=None, markupAttrs={}):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
911 found = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
912 markup = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
913 if isinstance(markupName, Tag):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
914 markup = markupName
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
915 markupAttrs = markup
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
916 callFunctionWithTagData = callable(self.name) \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
917 and not isinstance(markupName, Tag)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
918
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
919 if (not self.name) \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
920 or callFunctionWithTagData \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
921 or (markup and self._matches(markup, self.name)) \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
922 or (not markup and self._matches(markupName, self.name)):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
923 if callFunctionWithTagData:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
924 match = self.name(markupName, markupAttrs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
925 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
926 match = True
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
927 markupAttrMap = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
928 for attr, matchAgainst in self.attrs.items():
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
929 if not markupAttrMap:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
930 if hasattr(markupAttrs, 'get'):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
931 markupAttrMap = markupAttrs
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
932 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
933 markupAttrMap = {}
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
934 for k,v in markupAttrs:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
935 markupAttrMap[k] = v
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
936 attrValue = markupAttrMap.get(attr)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
937 if not self._matches(attrValue, matchAgainst):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
938 match = False
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
939 break
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
940 if match:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
941 if markup:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
942 found = markup
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
943 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
944 found = markupName
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
945 return found
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
946
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
947 def search(self, markup):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
948 #print 'looking for %s in %s' % (self, markup)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
949 found = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
950 # If given a list of items, scan it for a text element that
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
951 # matches.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
952 if hasattr(markup, "__iter__") \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
953 and not isinstance(markup, Tag):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
954 for element in markup:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
955 if isinstance(element, NavigableString) \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
956 and self.search(element):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
957 found = element
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
958 break
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
959 # If it's a Tag, make sure its name or attributes match.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
960 # Don't bother with Tags if we're searching for text.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
961 elif isinstance(markup, Tag):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
962 if not self.text:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
963 found = self.searchTag(markup)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
964 # If it's text, make sure the text matches.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
965 elif isinstance(markup, NavigableString) or \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
966 isinstance(markup, basestring):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
967 if self._matches(markup, self.text):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
968 found = markup
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
969 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
970 raise Exception, "I don't know how to match against a %s" \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
971 % markup.__class__
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
972 return found
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
973
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
974 def _matches(self, markup, matchAgainst):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
975 #print "Matching %s against %s" % (markup, matchAgainst)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
976 result = False
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
977 if matchAgainst is True:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
978 result = markup is not None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
979 elif callable(matchAgainst):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
980 result = matchAgainst(markup)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
981 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
982 #Custom match methods take the tag as an argument, but all
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
983 #other ways of matching match the tag name as a string.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
984 if isinstance(markup, Tag):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
985 markup = markup.name
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
986 if markup and not isinstance(markup, basestring):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
987 markup = unicode(markup)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
988 #Now we know that chunk is either a string, or None.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
989 if hasattr(matchAgainst, 'match'):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
990 # It's a regexp object.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
991 result = markup and matchAgainst.search(markup)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
992 elif hasattr(matchAgainst, '__iter__'): # list-like
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
993 result = markup in matchAgainst
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
994 elif hasattr(matchAgainst, 'items'):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
995 result = markup.has_key(matchAgainst)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
996 elif matchAgainst and isinstance(markup, basestring):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
997 if isinstance(markup, unicode):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
998 matchAgainst = unicode(matchAgainst)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
999 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1000 matchAgainst = str(matchAgainst)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1001
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1002 if not result:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1003 result = matchAgainst == markup
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1004 return result
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1005
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1006 class ResultSet(list):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1007 """A ResultSet is just a list that keeps track of the SoupStrainer
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1008 that created it."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1009 def __init__(self, source):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1010 list.__init__([])
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1011 self.source = source
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1012
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1013 # Now, some helper functions.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1014
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1015 def buildTagMap(default, *args):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1016 """Turns a list of maps, lists, or scalars into a single map.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1017 Used to build the SELF_CLOSING_TAGS, NESTABLE_TAGS, and
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1018 NESTING_RESET_TAGS maps out of lists and partial maps."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1019 built = {}
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1020 for portion in args:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1021 if hasattr(portion, 'items'):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1022 #It's a map. Merge it.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1023 for k,v in portion.items():
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1024 built[k] = v
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1025 elif hasattr(portion, '__iter__'): # is a list
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1026 #It's a list. Map each item to the default.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1027 for k in portion:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1028 built[k] = default
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1029 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1030 #It's a scalar. Map it to the default.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1031 built[portion] = default
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1032 return built
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1033
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1034 # Now, the parser classes.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1035
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1036 class BeautifulStoneSoup(Tag, SGMLParser):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1037
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1038 """This class contains the basic parser and search code. It defines
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1039 a parser that knows nothing about tag behavior except for the
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1040 following:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1041
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1042 You can't close a tag without closing all the tags it encloses.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1043 That is, "<foo><bar></foo>" actually means
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1044 "<foo><bar></bar></foo>".
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1045
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1046 [Another possible explanation is "<foo><bar /></foo>", but since
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1047 this class defines no SELF_CLOSING_TAGS, it will never use that
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1048 explanation.]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1049
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1050 This class is useful for parsing XML or made-up markup languages,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1051 or when BeautifulSoup makes an assumption counter to what you were
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1052 expecting."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1053
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1054 SELF_CLOSING_TAGS = {}
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1055 NESTABLE_TAGS = {}
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1056 RESET_NESTING_TAGS = {}
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1057 QUOTE_TAGS = {}
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1058 PRESERVE_WHITESPACE_TAGS = []
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1059
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1060 MARKUP_MASSAGE = [(re.compile('(<[^<>]*)/>'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1061 lambda x: x.group(1) + ' />'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1062 (re.compile('<!\s+([^<>]*)>'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1063 lambda x: '<!' + x.group(1) + '>')
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1064 ]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1065
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1066 ROOT_TAG_NAME = u'[document]'
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1067
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1068 HTML_ENTITIES = "html"
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1069 XML_ENTITIES = "xml"
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1070 XHTML_ENTITIES = "xhtml"
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1071 # TODO: This only exists for backwards-compatibility
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1072 ALL_ENTITIES = XHTML_ENTITIES
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1073
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1074 # Used when determining whether a text node is all whitespace and
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1075 # can be replaced with a single space. A text node that contains
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1076 # fancy Unicode spaces (usually non-breaking) should be left
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1077 # alone.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1078 STRIP_ASCII_SPACES = { 9: None, 10: None, 12: None, 13: None, 32: None, }
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1079
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1080 def __init__(self, markup="", parseOnlyThese=None, fromEncoding=None,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1081 markupMassage=True, smartQuotesTo=XML_ENTITIES,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1082 convertEntities=None, selfClosingTags=None, isHTML=False):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1083 """The Soup object is initialized as the 'root tag', and the
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1084 provided markup (which can be a string or a file-like object)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1085 is fed into the underlying parser.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1086
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1087 sgmllib will process most bad HTML, and the BeautifulSoup
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1088 class has some tricks for dealing with some HTML that kills
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1089 sgmllib, but Beautiful Soup can nonetheless choke or lose data
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1090 if your data uses self-closing tags or declarations
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1091 incorrectly.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1092
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1093 By default, Beautiful Soup uses regexes to sanitize input,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1094 avoiding the vast majority of these problems. If the problems
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1095 don't apply to you, pass in False for markupMassage, and
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1096 you'll get better performance.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1097
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1098 The default parser massage techniques fix the two most common
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1099 instances of invalid HTML that choke sgmllib:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1100
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1101 <br/> (No space between name of closing tag and tag close)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1102 <! --Comment--> (Extraneous whitespace in declaration)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1103
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1104 You can pass in a custom list of (RE object, replace method)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1105 tuples to get Beautiful Soup to scrub your input the way you
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1106 want."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1107
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1108 self.parseOnlyThese = parseOnlyThese
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1109 self.fromEncoding = fromEncoding
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1110 self.smartQuotesTo = smartQuotesTo
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1111 self.convertEntities = convertEntities
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1112 # Set the rules for how we'll deal with the entities we
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1113 # encounter
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1114 if self.convertEntities:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1115 # It doesn't make sense to convert encoded characters to
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1116 # entities even while you're converting entities to Unicode.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1117 # Just convert it all to Unicode.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1118 self.smartQuotesTo = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1119 if convertEntities == self.HTML_ENTITIES:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1120 self.convertXMLEntities = False
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1121 self.convertHTMLEntities = True
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1122 self.escapeUnrecognizedEntities = True
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1123 elif convertEntities == self.XHTML_ENTITIES:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1124 self.convertXMLEntities = True
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1125 self.convertHTMLEntities = True
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1126 self.escapeUnrecognizedEntities = False
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1127 elif convertEntities == self.XML_ENTITIES:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1128 self.convertXMLEntities = True
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1129 self.convertHTMLEntities = False
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1130 self.escapeUnrecognizedEntities = False
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1131 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1132 self.convertXMLEntities = False
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1133 self.convertHTMLEntities = False
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1134 self.escapeUnrecognizedEntities = False
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1135
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1136 self.instanceSelfClosingTags = buildTagMap(None, selfClosingTags)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1137 SGMLParser.__init__(self)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1138
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1139 if hasattr(markup, 'read'): # It's a file-type object.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1140 markup = markup.read()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1141 self.markup = markup
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1142 self.markupMassage = markupMassage
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1143 try:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1144 self._feed(isHTML=isHTML)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1145 except StopParsing:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1146 pass
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1147 self.markup = None # The markup can now be GCed
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1148
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1149 def convert_charref(self, name):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1150 """This method fixes a bug in Python's SGMLParser."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1151 try:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1152 n = int(name)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1153 except ValueError:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1154 return
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1155 if not 0 <= n <= 127 : # ASCII ends at 127, not 255
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1156 return
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1157 return self.convert_codepoint(n)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1158
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1159 def _feed(self, inDocumentEncoding=None, isHTML=False):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1160 # Convert the document to Unicode.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1161 markup = self.markup
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1162 if isinstance(markup, unicode):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1163 if not hasattr(self, 'originalEncoding'):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1164 self.originalEncoding = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1165 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1166 dammit = UnicodeDammit\
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1167 (markup, [self.fromEncoding, inDocumentEncoding],
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1168 smartQuotesTo=self.smartQuotesTo, isHTML=isHTML)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1169 markup = dammit.unicode
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1170 self.originalEncoding = dammit.originalEncoding
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1171 self.declaredHTMLEncoding = dammit.declaredHTMLEncoding
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1172 if markup:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1173 if self.markupMassage:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1174 if not hasattr(self.markupMassage, "__iter__"):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1175 self.markupMassage = self.MARKUP_MASSAGE
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1176 for fix, m in self.markupMassage:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1177 markup = fix.sub(m, markup)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1178 # TODO: We get rid of markupMassage so that the
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1179 # soup object can be deepcopied later on. Some
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1180 # Python installations can't copy regexes. If anyone
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1181 # was relying on the existence of markupMassage, this
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1182 # might cause problems.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1183 del(self.markupMassage)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1184 self.reset()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1185
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1186 SGMLParser.feed(self, markup)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1187 # Close out any unfinished strings and close all the open tags.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1188 self.endData()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1189 while self.currentTag.name != self.ROOT_TAG_NAME:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1190 self.popTag()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1191
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1192 def __getattr__(self, methodName):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1193 """This method routes method call requests to either the SGMLParser
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1194 superclass or the Tag superclass, depending on the method name."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1195 #print "__getattr__ called on %s.%s" % (self.__class__, methodName)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1196
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1197 if methodName.startswith('start_') or methodName.startswith('end_') \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1198 or methodName.startswith('do_'):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1199 return SGMLParser.__getattr__(self, methodName)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1200 elif not methodName.startswith('__'):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1201 return Tag.__getattr__(self, methodName)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1202 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1203 raise AttributeError
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1204
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1205 def isSelfClosingTag(self, name):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1206 """Returns true iff the given string is the name of a
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1207 self-closing tag according to this parser."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1208 return self.SELF_CLOSING_TAGS.has_key(name) \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1209 or self.instanceSelfClosingTags.has_key(name)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1210
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1211 def reset(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1212 Tag.__init__(self, self, self.ROOT_TAG_NAME)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1213 self.hidden = 1
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1214 SGMLParser.reset(self)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1215 self.currentData = []
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1216 self.currentTag = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1217 self.tagStack = []
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1218 self.quoteStack = []
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1219 self.pushTag(self)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1220
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1221 def popTag(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1222 tag = self.tagStack.pop()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1223
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1224 #print "Pop", tag.name
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1225 if self.tagStack:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1226 self.currentTag = self.tagStack[-1]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1227 return self.currentTag
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1228
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1229 def pushTag(self, tag):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1230 #print "Push", tag.name
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1231 if self.currentTag:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1232 self.currentTag.contents.append(tag)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1233 self.tagStack.append(tag)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1234 self.currentTag = self.tagStack[-1]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1235
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1236 def endData(self, containerClass=NavigableString):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1237 if self.currentData:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1238 currentData = u''.join(self.currentData)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1239 if (currentData.translate(self.STRIP_ASCII_SPACES) == '' and
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1240 not set([tag.name for tag in self.tagStack]).intersection(
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1241 self.PRESERVE_WHITESPACE_TAGS)):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1242 if '\n' in currentData:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1243 currentData = '\n'
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1244 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1245 currentData = ' '
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1246 self.currentData = []
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1247 if self.parseOnlyThese and len(self.tagStack) <= 1 and \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1248 (not self.parseOnlyThese.text or \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1249 not self.parseOnlyThese.search(currentData)):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1250 return
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1251 o = containerClass(currentData)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1252 o.setup(self.currentTag, self.previous)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1253 if self.previous:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1254 self.previous.next = o
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1255 self.previous = o
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1256 self.currentTag.contents.append(o)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1257
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1258
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1259 def _popToTag(self, name, inclusivePop=True):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1260 """Pops the tag stack up to and including the most recent
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1261 instance of the given tag. If inclusivePop is false, pops the tag
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1262 stack up to but *not* including the most recent instqance of
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1263 the given tag."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1264 #print "Popping to %s" % name
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1265 if name == self.ROOT_TAG_NAME:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1266 return
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1267
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1268 numPops = 0
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1269 mostRecentTag = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1270 for i in range(len(self.tagStack)-1, 0, -1):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1271 if name == self.tagStack[i].name:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1272 numPops = len(self.tagStack)-i
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1273 break
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1274 if not inclusivePop:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1275 numPops = numPops - 1
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1276
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1277 for i in range(0, numPops):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1278 mostRecentTag = self.popTag()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1279 return mostRecentTag
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1280
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1281 def _smartPop(self, name):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1282
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1283 """We need to pop up to the previous tag of this type, unless
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1284 one of this tag's nesting reset triggers comes between this
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1285 tag and the previous tag of this type, OR unless this tag is a
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1286 generic nesting trigger and another generic nesting trigger
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1287 comes between this tag and the previous tag of this type.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1288
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1289 Examples:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1290 <p>Foo<b>Bar *<p>* should pop to 'p', not 'b'.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1291 <p>Foo<table>Bar *<p>* should pop to 'table', not 'p'.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1292 <p>Foo<table><tr>Bar *<p>* should pop to 'tr', not 'p'.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1293
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1294 <li><ul><li> *<li>* should pop to 'ul', not the first 'li'.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1295 <tr><table><tr> *<tr>* should pop to 'table', not the first 'tr'
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1296 <td><tr><td> *<td>* should pop to 'tr', not the first 'td'
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1297 """
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1298
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1299 nestingResetTriggers = self.NESTABLE_TAGS.get(name)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1300 isNestable = nestingResetTriggers != None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1301 isResetNesting = self.RESET_NESTING_TAGS.has_key(name)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1302 popTo = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1303 inclusive = True
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1304 for i in range(len(self.tagStack)-1, 0, -1):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1305 p = self.tagStack[i]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1306 if (not p or p.name == name) and not isNestable:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1307 #Non-nestable tags get popped to the top or to their
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1308 #last occurance.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1309 popTo = name
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1310 break
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1311 if (nestingResetTriggers is not None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1312 and p.name in nestingResetTriggers) \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1313 or (nestingResetTriggers is None and isResetNesting
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1314 and self.RESET_NESTING_TAGS.has_key(p.name)):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1315
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1316 #If we encounter one of the nesting reset triggers
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1317 #peculiar to this tag, or we encounter another tag
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1318 #that causes nesting to reset, pop up to but not
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1319 #including that tag.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1320 popTo = p.name
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1321 inclusive = False
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1322 break
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1323 p = p.parent
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1324 if popTo:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1325 self._popToTag(popTo, inclusive)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1326
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1327 def unknown_starttag(self, name, attrs, selfClosing=0):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1328 #print "Start tag %s: %s" % (name, attrs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1329 if self.quoteStack:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1330 #This is not a real tag.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1331 #print "<%s> is not real!" % name
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1332 attrs = ''.join([' %s="%s"' % (x, y) for x, y in attrs])
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1333 self.handle_data('<%s%s>' % (name, attrs))
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1334 return
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1335 self.endData()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1336
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1337 if not self.isSelfClosingTag(name) and not selfClosing:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1338 self._smartPop(name)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1339
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1340 if self.parseOnlyThese and len(self.tagStack) <= 1 \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1341 and (self.parseOnlyThese.text or not self.parseOnlyThese.searchTag(name, attrs)):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1342 return
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1343
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1344 tag = Tag(self, name, attrs, self.currentTag, self.previous)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1345 if self.previous:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1346 self.previous.next = tag
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1347 self.previous = tag
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1348 self.pushTag(tag)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1349 if selfClosing or self.isSelfClosingTag(name):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1350 self.popTag()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1351 if name in self.QUOTE_TAGS:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1352 #print "Beginning quote (%s)" % name
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1353 self.quoteStack.append(name)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1354 self.literal = 1
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1355 return tag
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1356
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1357 def unknown_endtag(self, name):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1358 #print "End tag %s" % name
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1359 if self.quoteStack and self.quoteStack[-1] != name:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1360 #This is not a real end tag.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1361 #print "</%s> is not real!" % name
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1362 self.handle_data('</%s>' % name)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1363 return
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1364 self.endData()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1365 self._popToTag(name)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1366 if self.quoteStack and self.quoteStack[-1] == name:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1367 self.quoteStack.pop()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1368 self.literal = (len(self.quoteStack) > 0)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1369
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1370 def handle_data(self, data):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1371 self.currentData.append(data)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1372
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1373 def _toStringSubclass(self, text, subclass):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1374 """Adds a certain piece of text to the tree as a NavigableString
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1375 subclass."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1376 self.endData()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1377 self.handle_data(text)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1378 self.endData(subclass)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1379
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1380 def handle_pi(self, text):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1381 """Handle a processing instruction as a ProcessingInstruction
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1382 object, possibly one with a %SOUP-ENCODING% slot into which an
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1383 encoding will be plugged later."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1384 if text[:3] == "xml":
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1385 text = u"xml version='1.0' encoding='%SOUP-ENCODING%'"
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1386 self._toStringSubclass(text, ProcessingInstruction)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1387
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1388 def handle_comment(self, text):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1389 "Handle comments as Comment objects."
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1390 self._toStringSubclass(text, Comment)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1391
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1392 def handle_charref(self, ref):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1393 "Handle character references as data."
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1394 if self.convertEntities:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1395 data = unichr(int(ref))
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1396 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1397 data = '&#%s;' % ref
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1398 self.handle_data(data)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1399
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1400 def handle_entityref(self, ref):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1401 """Handle entity references as data, possibly converting known
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1402 HTML and/or XML entity references to the corresponding Unicode
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1403 characters."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1404 data = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1405 if self.convertHTMLEntities:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1406 try:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1407 data = unichr(name2codepoint[ref])
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1408 except KeyError:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1409 pass
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1410
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1411 if not data and self.convertXMLEntities:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1412 data = self.XML_ENTITIES_TO_SPECIAL_CHARS.get(ref)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1413
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1414 if not data and self.convertHTMLEntities and \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1415 not self.XML_ENTITIES_TO_SPECIAL_CHARS.get(ref):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1416 # TODO: We've got a problem here. We're told this is
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1417 # an entity reference, but it's not an XML entity
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1418 # reference or an HTML entity reference. Nonetheless,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1419 # the logical thing to do is to pass it through as an
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1420 # unrecognized entity reference.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1421 #
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1422 # Except: when the input is "&carol;" this function
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1423 # will be called with input "carol". When the input is
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1424 # "AT&T", this function will be called with input
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1425 # "T". We have no way of knowing whether a semicolon
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1426 # was present originally, so we don't know whether
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1427 # this is an unknown entity or just a misplaced
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1428 # ampersand.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1429 #
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1430 # The more common case is a misplaced ampersand, so I
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1431 # escape the ampersand and omit the trailing semicolon.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1432 data = "&amp;%s" % ref
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1433 if not data:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1434 # This case is different from the one above, because we
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1435 # haven't already gone through a supposedly comprehensive
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1436 # mapping of entities to Unicode characters. We might not
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1437 # have gone through any mapping at all. So the chances are
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1438 # very high that this is a real entity, and not a
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1439 # misplaced ampersand.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1440 data = "&%s;" % ref
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1441 self.handle_data(data)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1442
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1443 def handle_decl(self, data):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1444 "Handle DOCTYPEs and the like as Declaration objects."
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1445 self._toStringSubclass(data, Declaration)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1446
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1447 def parse_declaration(self, i):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1448 """Treat a bogus SGML declaration as raw data. Treat a CDATA
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1449 declaration as a CData object."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1450 j = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1451 if self.rawdata[i:i+9] == '<![CDATA[':
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1452 k = self.rawdata.find(']]>', i)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1453 if k == -1:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1454 k = len(self.rawdata)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1455 data = self.rawdata[i+9:k]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1456 j = k+3
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1457 self._toStringSubclass(data, CData)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1458 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1459 try:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1460 j = SGMLParser.parse_declaration(self, i)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1461 except SGMLParseError:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1462 toHandle = self.rawdata[i:]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1463 self.handle_data(toHandle)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1464 j = i + len(toHandle)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1465 return j
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1466
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1467 class BeautifulSoup(BeautifulStoneSoup):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1468
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1469 """This parser knows the following facts about HTML:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1470
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1471 * Some tags have no closing tag and should be interpreted as being
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1472 closed as soon as they are encountered.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1473
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1474 * The text inside some tags (ie. 'script') may contain tags which
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1475 are not really part of the document and which should be parsed
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1476 as text, not tags. If you want to parse the text as tags, you can
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1477 always fetch it and parse it explicitly.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1478
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1479 * Tag nesting rules:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1480
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1481 Most tags can't be nested at all. For instance, the occurance of
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1482 a <p> tag should implicitly close the previous <p> tag.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1483
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1484 <p>Para1<p>Para2
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1485 should be transformed into:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1486 <p>Para1</p><p>Para2
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1487
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1488 Some tags can be nested arbitrarily. For instance, the occurance
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1489 of a <blockquote> tag should _not_ implicitly close the previous
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1490 <blockquote> tag.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1491
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1492 Alice said: <blockquote>Bob said: <blockquote>Blah
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1493 should NOT be transformed into:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1494 Alice said: <blockquote>Bob said: </blockquote><blockquote>Blah
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1495
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1496 Some tags can be nested, but the nesting is reset by the
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1497 interposition of other tags. For instance, a <tr> tag should
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1498 implicitly close the previous <tr> tag within the same <table>,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1499 but not close a <tr> tag in another table.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1500
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1501 <table><tr>Blah<tr>Blah
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1502 should be transformed into:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1503 <table><tr>Blah</tr><tr>Blah
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1504 but,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1505 <tr>Blah<table><tr>Blah
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1506 should NOT be transformed into
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1507 <tr>Blah<table></tr><tr>Blah
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1508
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1509 Differing assumptions about tag nesting rules are a major source
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1510 of problems with the BeautifulSoup class. If BeautifulSoup is not
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1511 treating as nestable a tag your page author treats as nestable,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1512 try ICantBelieveItsBeautifulSoup, MinimalSoup, or
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1513 BeautifulStoneSoup before writing your own subclass."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1514
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1515 def __init__(self, *args, **kwargs):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1516 if not kwargs.has_key('smartQuotesTo'):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1517 kwargs['smartQuotesTo'] = self.HTML_ENTITIES
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1518 kwargs['isHTML'] = True
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1519 BeautifulStoneSoup.__init__(self, *args, **kwargs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1520
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1521 SELF_CLOSING_TAGS = buildTagMap(None,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1522 ('br' , 'hr', 'input', 'img', 'meta',
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1523 'spacer', 'link', 'frame', 'base', 'col'))
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1524
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1525 PRESERVE_WHITESPACE_TAGS = set(['pre', 'textarea'])
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1526
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1527 QUOTE_TAGS = {'script' : None, 'textarea' : None}
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1528
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1529 #According to the HTML standard, each of these inline tags can
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1530 #contain another tag of the same type. Furthermore, it's common
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1531 #to actually use these tags this way.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1532 NESTABLE_INLINE_TAGS = ('span', 'font', 'q', 'object', 'bdo', 'sub', 'sup',
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1533 'center')
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1534
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1535 #According to the HTML standard, these block tags can contain
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1536 #another tag of the same type. Furthermore, it's common
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1537 #to actually use these tags this way.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1538 NESTABLE_BLOCK_TAGS = ('blockquote', 'div', 'fieldset', 'ins', 'del')
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1539
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1540 #Lists can contain other lists, but there are restrictions.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1541 NESTABLE_LIST_TAGS = { 'ol' : [],
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1542 'ul' : [],
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1543 'li' : ['ul', 'ol'],
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1544 'dl' : [],
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1545 'dd' : ['dl'],
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1546 'dt' : ['dl'] }
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1547
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1548 #Tables can contain other tables, but there are restrictions.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1549 NESTABLE_TABLE_TAGS = {'table' : [],
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1550 'tr' : ['table', 'tbody', 'tfoot', 'thead'],
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1551 'td' : ['tr'],
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1552 'th' : ['tr'],
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1553 'thead' : ['table'],
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1554 'tbody' : ['table'],
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1555 'tfoot' : ['table'],
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1556 }
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1557
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1558 NON_NESTABLE_BLOCK_TAGS = ('address', 'form', 'p', 'pre')
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1559
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1560 #If one of these tags is encountered, all tags up to the next tag of
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1561 #this type are popped.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1562 RESET_NESTING_TAGS = buildTagMap(None, NESTABLE_BLOCK_TAGS, 'noscript',
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1563 NON_NESTABLE_BLOCK_TAGS,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1564 NESTABLE_LIST_TAGS,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1565 NESTABLE_TABLE_TAGS)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1566
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1567 NESTABLE_TAGS = buildTagMap([], NESTABLE_INLINE_TAGS, NESTABLE_BLOCK_TAGS,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1568 NESTABLE_LIST_TAGS, NESTABLE_TABLE_TAGS)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1569
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1570 # Used to detect the charset in a META tag; see start_meta
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1571 CHARSET_RE = re.compile("((^|;)\s*charset=)([^;]*)", re.M)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1572
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1573 def start_meta(self, attrs):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1574 """Beautiful Soup can detect a charset included in a META tag,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1575 try to convert the document to that charset, and re-parse the
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1576 document from the beginning."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1577 httpEquiv = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1578 contentType = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1579 contentTypeIndex = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1580 tagNeedsEncodingSubstitution = False
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1581
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1582 for i in range(0, len(attrs)):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1583 key, value = attrs[i]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1584 key = key.lower()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1585 if key == 'http-equiv':
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1586 httpEquiv = value
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1587 elif key == 'content':
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1588 contentType = value
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1589 contentTypeIndex = i
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1590
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1591 if httpEquiv and contentType: # It's an interesting meta tag.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1592 match = self.CHARSET_RE.search(contentType)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1593 if match:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1594 if (self.declaredHTMLEncoding is not None or
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1595 self.originalEncoding == self.fromEncoding):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1596 # An HTML encoding was sniffed while converting
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1597 # the document to Unicode, or an HTML encoding was
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1598 # sniffed during a previous pass through the
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1599 # document, or an encoding was specified
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1600 # explicitly and it worked. Rewrite the meta tag.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1601 def rewrite(match):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1602 return match.group(1) + "%SOUP-ENCODING%"
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1603 newAttr = self.CHARSET_RE.sub(rewrite, contentType)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1604 attrs[contentTypeIndex] = (attrs[contentTypeIndex][0],
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1605 newAttr)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1606 tagNeedsEncodingSubstitution = True
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1607 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1608 # This is our first pass through the document.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1609 # Go through it again with the encoding information.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1610 newCharset = match.group(3)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1611 if newCharset and newCharset != self.originalEncoding:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1612 self.declaredHTMLEncoding = newCharset
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1613 self._feed(self.declaredHTMLEncoding)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1614 raise StopParsing
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1615 pass
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1616 tag = self.unknown_starttag("meta", attrs)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1617 if tag and tagNeedsEncodingSubstitution:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1618 tag.containsSubstitutions = True
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1619
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1620 class StopParsing(Exception):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1621 pass
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1622
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1623 class ICantBelieveItsBeautifulSoup(BeautifulSoup):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1624
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1625 """The BeautifulSoup class is oriented towards skipping over
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1626 common HTML errors like unclosed tags. However, sometimes it makes
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1627 errors of its own. For instance, consider this fragment:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1628
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1629 <b>Foo<b>Bar</b></b>
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1630
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1631 This is perfectly valid (if bizarre) HTML. However, the
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1632 BeautifulSoup class will implicitly close the first b tag when it
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1633 encounters the second 'b'. It will think the author wrote
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1634 "<b>Foo<b>Bar", and didn't close the first 'b' tag, because
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1635 there's no real-world reason to bold something that's already
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1636 bold. When it encounters '</b></b>' it will close two more 'b'
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1637 tags, for a grand total of three tags closed instead of two. This
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1638 can throw off the rest of your document structure. The same is
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1639 true of a number of other tags, listed below.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1640
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1641 It's much more common for someone to forget to close a 'b' tag
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1642 than to actually use nested 'b' tags, and the BeautifulSoup class
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1643 handles the common case. This class handles the not-co-common
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1644 case: where you can't believe someone wrote what they did, but
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1645 it's valid HTML and BeautifulSoup screwed up by assuming it
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1646 wouldn't be."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1647
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1648 I_CANT_BELIEVE_THEYRE_NESTABLE_INLINE_TAGS = \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1649 ('em', 'big', 'i', 'small', 'tt', 'abbr', 'acronym', 'strong',
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1650 'cite', 'code', 'dfn', 'kbd', 'samp', 'strong', 'var', 'b',
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1651 'big')
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1652
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1653 I_CANT_BELIEVE_THEYRE_NESTABLE_BLOCK_TAGS = ('noscript',)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1654
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1655 NESTABLE_TAGS = buildTagMap([], BeautifulSoup.NESTABLE_TAGS,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1656 I_CANT_BELIEVE_THEYRE_NESTABLE_BLOCK_TAGS,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1657 I_CANT_BELIEVE_THEYRE_NESTABLE_INLINE_TAGS)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1658
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1659 class MinimalSoup(BeautifulSoup):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1660 """The MinimalSoup class is for parsing HTML that contains
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1661 pathologically bad markup. It makes no assumptions about tag
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1662 nesting, but it does know which tags are self-closing, that
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1663 <script> tags contain Javascript and should not be parsed, that
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1664 META tags may contain encoding information, and so on.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1665
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1666 This also makes it better for subclassing than BeautifulStoneSoup
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1667 or BeautifulSoup."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1668
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1669 RESET_NESTING_TAGS = buildTagMap('noscript')
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1670 NESTABLE_TAGS = {}
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1671
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1672 class BeautifulSOAP(BeautifulStoneSoup):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1673 """This class will push a tag with only a single string child into
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1674 the tag's parent as an attribute. The attribute's name is the tag
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1675 name, and the value is the string child. An example should give
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1676 the flavor of the change:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1677
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1678 <foo><bar>baz</bar></foo>
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1679 =>
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1680 <foo bar="baz"><bar>baz</bar></foo>
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1681
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1682 You can then access fooTag['bar'] instead of fooTag.barTag.string.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1683
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1684 This is, of course, useful for scraping structures that tend to
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1685 use subelements instead of attributes, such as SOAP messages. Note
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1686 that it modifies its input, so don't print the modified version
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1687 out.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1688
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1689 I'm not sure how many people really want to use this class; let me
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1690 know if you do. Mainly I like the name."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1691
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1692 def popTag(self):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1693 if len(self.tagStack) > 1:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1694 tag = self.tagStack[-1]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1695 parent = self.tagStack[-2]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1696 parent._getAttrMap()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1697 if (isinstance(tag, Tag) and len(tag.contents) == 1 and
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1698 isinstance(tag.contents[0], NavigableString) and
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1699 not parent.attrMap.has_key(tag.name)):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1700 parent[tag.name] = tag.contents[0]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1701 BeautifulStoneSoup.popTag(self)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1702
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1703 #Enterprise class names! It has come to our attention that some people
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1704 #think the names of the Beautiful Soup parser classes are too silly
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1705 #and "unprofessional" for use in enterprise screen-scraping. We feel
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1706 #your pain! For such-minded folk, the Beautiful Soup Consortium And
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1707 #All-Night Kosher Bakery recommends renaming this file to
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1708 #"RobustParser.py" (or, in cases of extreme enterprisiness,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1709 #"RobustParserBeanInterface.class") and using the following
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1710 #enterprise-friendly class aliases:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1711 class RobustXMLParser(BeautifulStoneSoup):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1712 pass
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1713 class RobustHTMLParser(BeautifulSoup):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1714 pass
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1715 class RobustWackAssHTMLParser(ICantBelieveItsBeautifulSoup):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1716 pass
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1717 class RobustInsanelyWackAssHTMLParser(MinimalSoup):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1718 pass
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1719 class SimplifyingSOAPParser(BeautifulSOAP):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1720 pass
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1721
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1722 ######################################################
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1723 #
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1724 # Bonus library: Unicode, Dammit
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1725 #
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1726 # This class forces XML data into a standard format (usually to UTF-8
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1727 # or Unicode). It is heavily based on code from Mark Pilgrim's
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1728 # Universal Feed Parser. It does not rewrite the XML or HTML to
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1729 # reflect a new encoding: that happens in BeautifulStoneSoup.handle_pi
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1730 # (XML) and BeautifulSoup.start_meta (HTML).
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1731
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1732 # Autodetects character encodings.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1733 # Download from http://chardet.feedparser.org/
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1734 try:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1735 import chardet
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1736 # import chardet.constants
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1737 # chardet.constants._debug = 1
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1738 except ImportError:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1739 chardet = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1740
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1741 # cjkcodecs and iconv_codec make Python know about more character encodings.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1742 # Both are available from http://cjkpython.i18n.org/
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1743 # They're built in if you use Python 2.4.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1744 try:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1745 import cjkcodecs.aliases
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1746 except ImportError:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1747 pass
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1748 try:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1749 import iconv_codec
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1750 except ImportError:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1751 pass
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1752
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1753 class UnicodeDammit:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1754 """A class for detecting the encoding of a *ML document and
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1755 converting it to a Unicode string. If the source encoding is
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1756 windows-1252, can replace MS smart quotes with their HTML or XML
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1757 equivalents."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1758
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1759 # This dictionary maps commonly seen values for "charset" in HTML
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1760 # meta tags to the corresponding Python codec names. It only covers
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1761 # values that aren't in Python's aliases and can't be determined
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1762 # by the heuristics in find_codec.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1763 CHARSET_ALIASES = { "macintosh" : "mac-roman",
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1764 "x-sjis" : "shift-jis" }
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1765
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1766 def __init__(self, markup, overrideEncodings=[],
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1767 smartQuotesTo='xml', isHTML=False):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1768 self.declaredHTMLEncoding = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1769 self.markup, documentEncoding, sniffedEncoding = \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1770 self._detectEncoding(markup, isHTML)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1771 self.smartQuotesTo = smartQuotesTo
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1772 self.triedEncodings = []
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1773 if markup == '' or isinstance(markup, unicode):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1774 self.originalEncoding = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1775 self.unicode = unicode(markup)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1776 return
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1777
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1778 u = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1779 for proposedEncoding in overrideEncodings:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1780 u = self._convertFrom(proposedEncoding)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1781 if u: break
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1782 if not u:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1783 for proposedEncoding in (documentEncoding, sniffedEncoding):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1784 u = self._convertFrom(proposedEncoding)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1785 if u: break
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1786
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1787 # If no luck and we have auto-detection library, try that:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1788 if not u and chardet and not isinstance(self.markup, unicode):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1789 u = self._convertFrom(chardet.detect(self.markup)['encoding'])
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1790
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1791 # As a last resort, try utf-8 and windows-1252:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1792 if not u:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1793 for proposed_encoding in ("utf-8", "windows-1252"):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1794 u = self._convertFrom(proposed_encoding)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1795 if u: break
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1796
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1797 self.unicode = u
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1798 if not u: self.originalEncoding = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1799
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1800 def _subMSChar(self, orig):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1801 """Changes a MS smart quote character to an XML or HTML
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1802 entity."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1803 sub = self.MS_CHARS.get(orig)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1804 if isinstance(sub, tuple):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1805 if self.smartQuotesTo == 'xml':
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1806 sub = '&#x%s;' % sub[1]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1807 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1808 sub = '&%s;' % sub[0]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1809 return sub
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1810
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1811 def _convertFrom(self, proposed):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1812 proposed = self.find_codec(proposed)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1813 if not proposed or proposed in self.triedEncodings:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1814 return None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1815 self.triedEncodings.append(proposed)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1816 markup = self.markup
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1817
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1818 # Convert smart quotes to HTML if coming from an encoding
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1819 # that might have them.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1820 if self.smartQuotesTo and proposed.lower() in("windows-1252",
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1821 "iso-8859-1",
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1822 "iso-8859-2"):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1823 markup = re.compile("([\x80-\x9f])").sub \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1824 (lambda(x): self._subMSChar(x.group(1)),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1825 markup)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1826
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1827 try:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1828 # print "Trying to convert document to %s" % proposed
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1829 u = self._toUnicode(markup, proposed)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1830 self.markup = u
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1831 self.originalEncoding = proposed
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1832 except Exception, e:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1833 # print "That didn't work!"
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1834 # print e
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1835 return None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1836 #print "Correct encoding: %s" % proposed
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1837 return self.markup
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1838
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1839 def _toUnicode(self, data, encoding):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1840 '''Given a string and its encoding, decodes the string into Unicode.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1841 %encoding is a string recognized by encodings.aliases'''
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1842
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1843 # strip Byte Order Mark (if present)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1844 if (len(data) >= 4) and (data[:2] == '\xfe\xff') \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1845 and (data[2:4] != '\x00\x00'):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1846 encoding = 'utf-16be'
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1847 data = data[2:]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1848 elif (len(data) >= 4) and (data[:2] == '\xff\xfe') \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1849 and (data[2:4] != '\x00\x00'):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1850 encoding = 'utf-16le'
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1851 data = data[2:]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1852 elif data[:3] == '\xef\xbb\xbf':
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1853 encoding = 'utf-8'
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1854 data = data[3:]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1855 elif data[:4] == '\x00\x00\xfe\xff':
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1856 encoding = 'utf-32be'
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1857 data = data[4:]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1858 elif data[:4] == '\xff\xfe\x00\x00':
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1859 encoding = 'utf-32le'
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1860 data = data[4:]
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1861 newdata = unicode(data, encoding)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1862 return newdata
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1863
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1864 def _detectEncoding(self, xml_data, isHTML=False):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1865 """Given a document, tries to detect its XML encoding."""
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1866 xml_encoding = sniffed_xml_encoding = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1867 try:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1868 if xml_data[:4] == '\x4c\x6f\xa7\x94':
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1869 # EBCDIC
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1870 xml_data = self._ebcdic_to_ascii(xml_data)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1871 elif xml_data[:4] == '\x00\x3c\x00\x3f':
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1872 # UTF-16BE
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1873 sniffed_xml_encoding = 'utf-16be'
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1874 xml_data = unicode(xml_data, 'utf-16be').encode('utf-8')
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1875 elif (len(xml_data) >= 4) and (xml_data[:2] == '\xfe\xff') \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1876 and (xml_data[2:4] != '\x00\x00'):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1877 # UTF-16BE with BOM
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1878 sniffed_xml_encoding = 'utf-16be'
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1879 xml_data = unicode(xml_data[2:], 'utf-16be').encode('utf-8')
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1880 elif xml_data[:4] == '\x3c\x00\x3f\x00':
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1881 # UTF-16LE
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1882 sniffed_xml_encoding = 'utf-16le'
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1883 xml_data = unicode(xml_data, 'utf-16le').encode('utf-8')
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1884 elif (len(xml_data) >= 4) and (xml_data[:2] == '\xff\xfe') and \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1885 (xml_data[2:4] != '\x00\x00'):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1886 # UTF-16LE with BOM
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1887 sniffed_xml_encoding = 'utf-16le'
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1888 xml_data = unicode(xml_data[2:], 'utf-16le').encode('utf-8')
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1889 elif xml_data[:4] == '\x00\x00\x00\x3c':
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1890 # UTF-32BE
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1891 sniffed_xml_encoding = 'utf-32be'
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1892 xml_data = unicode(xml_data, 'utf-32be').encode('utf-8')
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1893 elif xml_data[:4] == '\x3c\x00\x00\x00':
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1894 # UTF-32LE
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1895 sniffed_xml_encoding = 'utf-32le'
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1896 xml_data = unicode(xml_data, 'utf-32le').encode('utf-8')
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1897 elif xml_data[:4] == '\x00\x00\xfe\xff':
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1898 # UTF-32BE with BOM
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1899 sniffed_xml_encoding = 'utf-32be'
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1900 xml_data = unicode(xml_data[4:], 'utf-32be').encode('utf-8')
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1901 elif xml_data[:4] == '\xff\xfe\x00\x00':
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1902 # UTF-32LE with BOM
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1903 sniffed_xml_encoding = 'utf-32le'
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1904 xml_data = unicode(xml_data[4:], 'utf-32le').encode('utf-8')
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1905 elif xml_data[:3] == '\xef\xbb\xbf':
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1906 # UTF-8 with BOM
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1907 sniffed_xml_encoding = 'utf-8'
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1908 xml_data = unicode(xml_data[3:], 'utf-8').encode('utf-8')
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1909 else:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1910 sniffed_xml_encoding = 'ascii'
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1911 pass
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1912 except:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1913 xml_encoding_match = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1914 xml_encoding_match = re.compile(
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1915 '^<\?.*encoding=[\'"](.*?)[\'"].*\?>').match(xml_data)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1916 if not xml_encoding_match and isHTML:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1917 regexp = re.compile('<\s*meta[^>]+charset=([^>]*?)[;\'">]', re.I)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1918 xml_encoding_match = regexp.search(xml_data)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1919 if xml_encoding_match is not None:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1920 xml_encoding = xml_encoding_match.groups()[0].lower()
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1921 if isHTML:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1922 self.declaredHTMLEncoding = xml_encoding
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1923 if sniffed_xml_encoding and \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1924 (xml_encoding in ('iso-10646-ucs-2', 'ucs-2', 'csunicode',
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1925 'iso-10646-ucs-4', 'ucs-4', 'csucs4',
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1926 'utf-16', 'utf-32', 'utf_16', 'utf_32',
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1927 'utf16', 'u16')):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1928 xml_encoding = sniffed_xml_encoding
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1929 return xml_data, xml_encoding, sniffed_xml_encoding
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1930
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1931
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1932 def find_codec(self, charset):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1933 return self._codec(self.CHARSET_ALIASES.get(charset, charset)) \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1934 or (charset and self._codec(charset.replace("-", ""))) \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1935 or (charset and self._codec(charset.replace("-", "_"))) \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1936 or charset
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1937
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1938 def _codec(self, charset):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1939 if not charset: return charset
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1940 codec = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1941 try:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1942 codecs.lookup(charset)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1943 codec = charset
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1944 except (LookupError, ValueError):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1945 pass
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1946 return codec
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1947
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1948 EBCDIC_TO_ASCII_MAP = None
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1949 def _ebcdic_to_ascii(self, s):
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1950 c = self.__class__
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1951 if not c.EBCDIC_TO_ASCII_MAP:
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1952 emap = (0,1,2,3,156,9,134,127,151,141,142,11,12,13,14,15,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1953 16,17,18,19,157,133,8,135,24,25,146,143,28,29,30,31,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1954 128,129,130,131,132,10,23,27,136,137,138,139,140,5,6,7,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1955 144,145,22,147,148,149,150,4,152,153,154,155,20,21,158,26,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1956 32,160,161,162,163,164,165,166,167,168,91,46,60,40,43,33,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1957 38,169,170,171,172,173,174,175,176,177,93,36,42,41,59,94,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1958 45,47,178,179,180,181,182,183,184,185,124,44,37,95,62,63,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1959 186,187,188,189,190,191,192,193,194,96,58,35,64,39,61,34,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1960 195,97,98,99,100,101,102,103,104,105,196,197,198,199,200,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1961 201,202,106,107,108,109,110,111,112,113,114,203,204,205,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1962 206,207,208,209,126,115,116,117,118,119,120,121,122,210,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1963 211,212,213,214,215,216,217,218,219,220,221,222,223,224,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1964 225,226,227,228,229,230,231,123,65,66,67,68,69,70,71,72,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1965 73,232,233,234,235,236,237,125,74,75,76,77,78,79,80,81,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1966 82,238,239,240,241,242,243,92,159,83,84,85,86,87,88,89,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1967 90,244,245,246,247,248,249,48,49,50,51,52,53,54,55,56,57,
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1968 250,251,252,253,254,255)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1969 import string
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1970 c.EBCDIC_TO_ASCII_MAP = string.maketrans( \
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1971 ''.join(map(chr, range(256))), ''.join(map(chr, emap)))
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1972 return s.translate(c.EBCDIC_TO_ASCII_MAP)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1973
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1974 MS_CHARS = { '\x80' : ('euro', '20AC'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1975 '\x81' : ' ',
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1976 '\x82' : ('sbquo', '201A'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1977 '\x83' : ('fnof', '192'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1978 '\x84' : ('bdquo', '201E'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1979 '\x85' : ('hellip', '2026'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1980 '\x86' : ('dagger', '2020'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1981 '\x87' : ('Dagger', '2021'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1982 '\x88' : ('circ', '2C6'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1983 '\x89' : ('permil', '2030'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1984 '\x8A' : ('Scaron', '160'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1985 '\x8B' : ('lsaquo', '2039'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1986 '\x8C' : ('OElig', '152'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1987 '\x8D' : '?',
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1988 '\x8E' : ('#x17D', '17D'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1989 '\x8F' : '?',
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1990 '\x90' : '?',
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1991 '\x91' : ('lsquo', '2018'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1992 '\x92' : ('rsquo', '2019'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1993 '\x93' : ('ldquo', '201C'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1994 '\x94' : ('rdquo', '201D'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1995 '\x95' : ('bull', '2022'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1996 '\x96' : ('ndash', '2013'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1997 '\x97' : ('mdash', '2014'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1998 '\x98' : ('tilde', '2DC'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1999 '\x99' : ('trade', '2122'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
2000 '\x9a' : ('scaron', '161'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
2001 '\x9b' : ('rsaquo', '203A'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
2002 '\x9c' : ('oelig', '153'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
2003 '\x9d' : '?',
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
2004 '\x9e' : ('#x17E', '17E'),
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
2005 '\x9f' : ('Yuml', ''),}
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
2006
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
2007 #######################################################################
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
2008
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
2009
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
2010 #By default, act as an HTML pretty-printer.
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
2011 if __name__ == '__main__':
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
2012 import sys
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
2013 soup = BeautifulSoup(sys.stdin)
f02e37f395ae Added ability to add files from the `hg status` window.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
2014 print soup.prettify()