The Camp Fire Girls Solve A Mystery—Text Post Processing Notes

Some notes on the preparation of a F2-style output text and associated transcriber's notes.

Text Post Processing

Import

Use UNIX line endings, convert to utf-8 handling simple transliterations (eg., --). Results: text and good word list.

Correct the output of the rounds

Fix any mistakes missed by P3/F2. Results: text, or as a diff.

Correct printer's errors

Fix printer's errors as noted by proofreaders. Notes detailing fixed errors are transformed to <corr> form; notes detailing unfixed errors are left untransformed. Results: text.

Validation

Validate the resulting text. Result: log.

Identify quotation marks

Identify quotation marks and replace them with their curly forms. Apostrophes are left unchanged. Inputs: PG #10688, PG #36485, PG #36833, good words, bad words. Result: text, word analysis log.

Handle hyphens

Rejoin words split across pages. Identify -*ed words and resolve them. Inputs: good words, Result: text, resolution log.

Handle notes

Remove notes that we've handled. Inputs: handled notes, Result: text.

Word frequency analysis

Generate a word list, and a matching base word by eliding diacritical marks and hyphens. Replace words identified as bad. Inputs: bad words, Result: text, log.

Transcriber's notes

Use the <corr> markers to generate a list of changes made to the original text and remove the markers from the main text. Result: notes, text.

Export

Use DOS line endings. Final result: notes and text.

Checking (ASCII)

Rejoin pages

Strip page markers & blank pages, and join blocks that cross pages. Result: text.

ASCII transliteration

Convert to ASCII by transliterating unicode. Result: text.

Convert to plain text

Convert to plain text by indenting /# and /p blocks, and handling <tb>, <sc>, <i>, & <b> (<f> and <g> are discarded). Result: text.

Re-wrap

Re-wrap text to 72 coloumns honouring /*, /$ and /p no-wrap blocks. Result: text.

Discard block markers

Discard /*, /$, /p, and /# block markers. Result: text.

Export

Use DOS line endings. Result: text.

Run gutcheck

Run gutcheck. Result: log.

Run jeebies

Run jeebies. Result: log.

Checking (UTF-8)

Rejoin pages

Strip page markers & blank pages, and join blocks that cross pages. Result: text.

Convert to plain text

Convert to plain text by indenting /# and /p blocks, and handling <tb>, <sc>, <i>, & <b> (<f> and <g> are discarded). Result: text.

Re-wrap

Re-wrap text to 72 coloumns honouring /*, /$ and /p no-wrap blocks. Result: text.

Discard block markers

Discard /*, /$, /p, and /# block markers. Result: text.

Spell check

Run a spell check honouring the project's good words list. Result: list of possible misspellings.