annotate tests/test_commands_populate.py @ 27:c898b4df0f29

Use context for html stripping, with support for custom URL sizes For instance in Twitter URLs are 23 characters long since they use their own URL shortening service. Without taking this into account, post lengths would not be calculated correctly.
author Ludovic Chabant <ludovic@chabant.com>
date Wed, 19 Apr 2023 12:46:58 -0700
parents a921cc2306bc
children b739ca5feb45
Ignore whitespace changes - Everywhere: Within whitespace: At end of lines:
rev   line source
0
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
1
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
2 feed1 = """
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
3 <html><body>
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
4 <article class="h-entry">
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
5 <h1 class="p-name">A new article</h1>
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
6 <div class="e-content">
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
7 <p>This is the text of the article.</p>
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
8 <p>It has 2 paragraphs.</p>
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
9 </div>
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
10 <a class="u-url" href="https://example.org/a-new-article">permalink</a>
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
11 </article>
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
12 </body></html>"""
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
13
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
14
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
15 def test_populate(cli):
18
a921cc2306bc Do our own HTML parsing/stripping of micropost contents.
Ludovic Chabant <ludovic@chabant.com>
parents: 0
diff changeset
16 feed = cli.createTempFeed(feed1)
0
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
17 cli.appendSiloConfig('test', 'print', items='name')
18
a921cc2306bc Do our own HTML parsing/stripping of micropost contents.
Ludovic Chabant <ludovic@chabant.com>
parents: 0
diff changeset
18 cli.setFeedConfig('feed', feed)
a921cc2306bc Do our own HTML parsing/stripping of micropost contents.
Ludovic Chabant <ludovic@chabant.com>
parents: 0
diff changeset
19 ctx, _ = cli.run('populate', '-s', 'test')
0
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
20 assert ctx.cache.wasPosted('test', 'https://example.org/a-new-article')
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
21
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
22
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
23 feed2 = """
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
24 <html><body>
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
25 <article class="h-entry">
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
26 <h1 class="p-name">First article</h1>
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
27 <div><time class="dt-published" datetime="2018-01-07T09:30:00-00:00"></time></div>
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
28 <a class="u-url" href="https://example.org/first-article">permalink</a>
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
29 </article>
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
30 <article class="h-entry">
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
31 <h1 class="p-name">Second article</h1>
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
32 <div><time class="dt-published" datetime="2018-01-08T09:30:00-00:00"></time></div>
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
33 <a class="u-url" href="https://example.org/second-article">permalink</a>
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
34 </article>
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
35 <article class="h-entry">
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
36 <h1 class="p-name">Third article</h1>
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
37 <div><time class="dt-published" datetime="2018-01-09T09:30:00-00:00"></time></div>
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
38 <a class="u-url" href="https://example.org/third-article">permalink</a>
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
39 </article>
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
40 </body></html>""" # NOQA
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
41
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
42
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
43 def test_populate_until(cli):
18
a921cc2306bc Do our own HTML parsing/stripping of micropost contents.
Ludovic Chabant <ludovic@chabant.com>
parents: 0
diff changeset
44 feed = cli.createTempFeed(feed2)
0
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
45 cli.appendSiloConfig('test', 'print', items='name')
18
a921cc2306bc Do our own HTML parsing/stripping of micropost contents.
Ludovic Chabant <ludovic@chabant.com>
parents: 0
diff changeset
46 cli.setFeedConfig('feed', feed)
a921cc2306bc Do our own HTML parsing/stripping of micropost contents.
Ludovic Chabant <ludovic@chabant.com>
parents: 0
diff changeset
47 ctx, _ = cli.run('populate', '-s', 'test', '--until', '2018-01-08')
0
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
48 assert ctx.cache.wasPosted('test', 'https://example.org/first-article')
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
49 assert ctx.cache.wasPosted('test', 'https://example.org/second-article')
a1b7a459326a Initial commit.
Ludovic Chabant <ludovic@chabant.com>
parents:
diff changeset
50 assert not ctx.cache.wasPosted('test', 'https://example.org/third-article')