-
Notifications
You must be signed in to change notification settings - Fork 14
/
Copy pathTODO.TXT
68 lines (51 loc) · 1.95 KB
/
TODO.TXT
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
63
64
65
66
67
68
========================
templatemaker to-do list
========================
templatemaker.c
===============
* Is there a more efficient algorithm for the longest_match() C function?
* Provide a Python fallback module.
templatemaker.py
================
* Allow for user definition of symbols that would be treated as "single"
units, instead of always using each character as a separate symbol for
comparison.
* Create template, then load all known documents into it to get statistics
on which holes differ more frequently. Can't do this while loading
documents because a template's holes are not yet defined.
* Keep track of which holes have only 2 to 5 possible values, and mark those
in a special way -- "option holes."
* Keep track of hole values in known documents, and mark certain holes as
"pointless" if their values are duplicates.
* Don't save \r\n\r\n\r\n\r\n.
* Automatically determine which fields are dates/times, and have extract()
convert to datetime objects.
* If learn() creates more holes, it should provide the PREVIOUS value for
that hole (before it was a hole) for all previous documents. Thus,
previous documents will be able to account for the new hole with the
correct value. For example:
>>> t = Template()
>>> t.learn('<b>this and that</b><br>')
>>> t.as_text('!')
'<b>this and that</b><br>'
>>> t.learn('<b>alex and sue</b><br>')
True
>>> t.as_text('!')
'<b>! and !</b><br>'
>>> t.last_change
{0: 'this', 1: 'that'}
>>> t.learn('<b>michael and scottie</b>')
True
>>> t.as_text('!')
'<b>! and !</b>!'
>>> t.last_change
{2: '<br>'}
>>> t.learn('<b>foo & bar</b><br>')
True
>>> t.as_text('!')
'<b>! ! !</b>!'
>>> t.last_change
{1: 'and'}
last_change is an attribute rather than being a return value of learn() to
keep things clean. The value of last_change is overridden each time learn() is
called.