Tuesday, February 22, 2011

Mature Women In Bras And Girdles

Automate correcting typos

Many of the typographical errors that appear when you perform an OCR scan a book, or simply in the book if you have not gone through a thorough editing process can be corrected automatically in most cases using regular expressions.

Let's see some examples Sencillito.

# space that does not follow a point and followed
Expression: "\\. ([AZ])"
Replacement ". $ 1"

# Space before closing question
Expression "\\ s \\?" Replacing
"?"

# dashes inside a word, replace it with a dash
Expression: "([A-Za-zÁÉÍÓÚÑáéíóúü ])-([ A-Za-zÁÉÍÓÚÑáéíóúü])"
Replacement: $ 1 - $ 2 "

And following this structure we can tailor a list to correct common errors (spaces before the exclamation marks, dashes of dialogue, etc. ..) The more sophisticated the search you want to do , more complicated regular expression, but with patience and skill can do almost anything.

And if you put them all in a text file in the format # comment



expression substitution

# comment
expression substitution


...

can apply all of a sudden we have a epub already prepared by epubcorrect script that I mentioned a few posts.

0 comments:

Post a Comment