I just recently had to write a program in my systems programming class to find the most frequently occurring words in any text file. After writing the program, I needed to test the program with some fairly large input, so I downloaded a copy of the bible.
My results:
The top 50 words and their occurrences (out of 13262 unique words) are:
64926 the
52167 and
35312 of
14048 to
13229 that
12891 in
10517 he
9851 shall
9130 for
9041 unto
8868 i
8563 his
8438 a
7990 lord
7490 they
7188 be
7119 is
6727 not
6695 him
6514 them
6300 it
6099 with
5672 all
5475 thou
4607 was
4606 thy
4531 god
4491 which
4372 my
4102 me
4077 but
4037 their
4008 said
3988 ye
3969 have
3865 will
3833 thee
3702 from
3661 as
3006 are
2869 when
2854 this
2837 were
2805 out
2793 upon
2761 man
2754 by
2625 you
2581 israel
2557 king
The program is written in c and runs quite quickly, taking about a third of a second to run on my 2.53Ghz Core 2 Duo.
ur syk bro