« Back to projects

Proofreading Scripts

Languages: Ruby
Summary: An implementation of Matt Might's proofreading scripts
Source code: github [.tar.gz / .zip]

Details

I wanted to make use of Matt Might’s proofreading scripts, but I’m on a Windows machine most of the time and would rather not rely on cygwin, a so I rewrote them in Ruby and customized them a bit.

I made sure to handle the case where a suspect phrase (e.g. ‘is used’, or ‘the the’) is split between two different lines.

The output has line numbers, includes the suspect words at the beginning of the line, and is more or less readable in the windows command prompt. It uses >angle brackets< to mark the word instead of color and tries to stay within a fairly short width.

When run, it automatically creates a file called feedback-<filename>. If it finds weasel words or passive voice, it adds a couple of short tips from Matt’s original blog post after the list of them.

These scripts are finished and working, but if you find any bugs please let me know.

Usage

ruby proofread.rb [filename]

Example Output

For a fun example, I ran this on Matt’s original blog post. It acts as a pretty decent test because the blog post itself must contain examples of suspect phrases in order to explain them.

Weasel words results:

 73: various - My favorite salt and pepper words/phrases are >various<, a number of, fairly, and 
 73: fairly - My favorite salt and pepper words/phrases are various, a number of, >fairly<, and 
 74: quite - >quite<. Sentences that cut these words out become stronger. 
 76: quite -  Bad:    It is >quite< difficult to find untainted samples.  Better: It is 
 79: various -  Bad:    We used >various< methods to isolate four samples.  Better: We isolated 
 85: interestingly - >interestingly<, surprisingly, remarkably, or clearly. 
 85: surprisingly - interestingly, >surprisingly<, remarkably, or clearly. 
 85: remarkably - interestingly, surprisingly, >remarkably<, or clearly. 
 85: clearly - interestingly, surprisingly, remarkably, or >clearly<. 
 89: surprisingly - Bad:    False positives were >surprisingly< low.  Better: To our surprise, false 
100: very - The two worst offenders in this category are the words >very< and extremely. These 
100: extremely - The two worst offenders in this category are the words very and >extremely<. These 
103: several - Other offenders include >several<, exceedingly, many, most, few, vast. 
103: exceedingly - Other offenders include several, >exceedingly<, many, most, few, vast. 
103: many - Other offenders include several, exceedingly, >many<, most, few, vast. 
103: few - Other offenders include several, exceedingly, many, most, >few<, vast. 
103: vast - Other offenders include several, exceedingly, many, most, few, >vast<. 
105: very -  Bad:    There is >very< close match between the two semantics.  Better: There is 
117: completely -  Bad:    We offer a >completely< different formulation of CFA.  Better: We offer a 
207: few - A >few< words on marketing 

Look out for:
    Salt and Pepper words (various, fairly, quite)
    Beholder words (interestingly, surprisingly, clearly)
    Lazy words (very, extremely, several, most, few)
    Adverbs


Passive voice results:

 141: is guaranteed -  Bad:    Termination >is guaranteed< on any input.  Better: Termination is 
 141: is  guaranteed -  Bad:    Termination is guaranteed on any input.  Better: Termination is 
 155: were added -  OK: 4 mL HCl >were added< to the solution. 
 211: be accepted - writes has to >be accepted< for publication. 

For each use of the passive highlighted, ask:
    1. Is the agent relevant yet unclear?
    2. Does the text read better with the sentence in the active?
If the answer to both questions is 'yes,' then change to the active.
If only the answer to the first question is 'yes,' then specify the agent. 


Duplicated words results:

 175: the the -  Many readers are not aware that >the the< brain will automatically ignore a 
 180: the the -  Many readers are not aware that >the the< brain will automatically ignore a 
 181: instance instance -  second >instance instance< of the word "the" when it starts a new line.  

Content by Nick Knowlson: Google+
rsstwitter