Commit 31d1cdf
committed
ETL: Filter headers and footers out of documents
These patterns are common to all of the Oral History Center interview
PDFs that I was able to find. Processing 7 documents with these new
filters added exactly 0.04s of runtime on my workstation, so there is
not a large performance impact.
Implements: AP-4551 parent 6e4ea80 commit 31d1cdf
2 files changed
+22
-2
lines changed| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
5 | 5 | | |
6 | 6 | | |
7 | 7 | | |
| 8 | + | |
8 | 9 | | |
9 | 10 | | |
10 | 11 | | |
| |||
27 | 28 | | |
28 | 29 | | |
29 | 30 | | |
| 31 | + | |
| 32 | + | |
| 33 | + | |
| 34 | + | |
| 35 | + | |
| 36 | + | |
| 37 | + | |
| 38 | + | |
| 39 | + | |
| 40 | + | |
| 41 | + | |
| 42 | + | |
| 43 | + | |
| 44 | + | |
| 45 | + | |
| 46 | + | |
| 47 | + | |
| 48 | + | |
30 | 49 | | |
31 | 50 | | |
32 | 51 | | |
| |||
45 | 64 | | |
46 | 65 | | |
47 | 66 | | |
48 | | - | |
| 67 | + | |
49 | 68 | | |
50 | 69 | | |
51 | 70 | | |
| |||
91 | 110 | | |
92 | 111 | | |
93 | 112 | | |
94 | | - | |
| 113 | + | |
95 | 114 | | |
96 | 115 | | |
97 | 116 | | |
| |||
| Original file line number | Diff line number | Diff line change | |
|---|---|---|---|
| |||
718 | 718 | | |
719 | 719 | | |
720 | 720 | | |
| 721 | + | |
721 | 722 | | |
722 | 723 | | |
723 | 724 | | |
| |||
0 commit comments