Automate correcting typos
Many of the typographical errors that appear when you perform an OCR scan a book, or simply in the book if you have not gone through a thorough editing process can be corrected automatically in most cases using regular expressions.
Let's see some examples Sencillito.
# space that does not follow a point and followed
Expression: "\\. ([AZ])"
Replacement ". $ 1"
# Space before closing question
Expression "\\ s \\?" Replacing
"?"
# dashes inside a word, replace it with a dash
Expression: "([A-Za-zÁÉÍÓÚÑáéíóúü ])-([ A-Za-zÁÉÍÓÚÑáéíóúü])"
Replacement: $ 1 - $ 2 "
And following this structure we can tailor a list to correct common errors (spaces before the exclamation marks, dashes of dialogue, etc. ..) The more sophisticated the search you want to do , more complicated regular expression, but with patience and skill can do almost anything.
And if you put them all in a text file in the format # comment
expression substitution
# comment
expression substitution
...
can apply all of a sudden we have a epub already prepared by epubcorrect script that I mentioned a few posts.
0 comments:
Post a Comment