I love LaTeX. But some things that are easy to do are just not obvious, especially to the uninitiated. Like word count. Someone recently asked me how to determine the word count of a LaTeX document. The problem is that we want to ignore the LaTeX markup, so just counting ‘words’ with:
wc -w
is going to give an inflated estimated. Explicitly filtering the LaTeX markup is impractical—too many packages not to mention user-defined commands. So what are we to do?
Fortunately, the utility:
ps2ascii
can help. It converts postscript and pdf into text. By typesetting the LaTeX document we, in effect, strip out the markup. So if we typeset our LaTeX document with
pdflatex
running the following command in the terminal will return the word count:
ps2ascii mydocument.pdf | wc -w
That was easy. Let’s make it easier. If you are lucky enough to be writing your LaTeX document in TextMate, you might want to check the word count of your document as you are writing it. You could use the statistics command, ⌃ ⇧ N, but that would give the inflated estimate. It would be better to check the LaTeX document’s directory for the typeset pdf, if any, and then run the above command. Here is a command that does just that:
NAME="${TM_FILENAME}"
BASENAME="${NAME%.*}"
if [ -a "$TM_DIRECTORY"/"$BASENAME".pdf ]
then ps2ascii "$TM_DIRECTORY"/"$BASENAME".pdf | wc -w
else echo "You must typeset your document before a word count can be determined."
fi
Here is a screenshot of the command in the Bundle Editor:
You can download the command here. Now go count some words.
{ 3 } Trackbacks
[…] quickly Google’d and found a blog post on how to do it […]
[…] an earlier post I described a TextMate command for determining the word count for a LaTeX document. Simpling […]
[…] an earlier post I described a TextMate command for determining the word count for a LaTeX document. Simpling […]