North Sami Letter Frequency
The command was:
ccat -l sme -r freecorpus/converted/sme/ |tr '[a-zæøåöäáčđŋšŧž]' '[A-ZÆØÅÖÄÁČĐŊŠŦŽ]'| sed 's/\(.\)/\1 /g;'|tr ' ' '\n' | grep '[A-ZÆØÅÖÄÁČĐŊŠŦŽ]'|sort |uniq -c | sort -nr
And the result, for N=46438888, was:
- A = 12,98%
- I = 9,22%
- E = 6,81%
- T = 6,08%
- D = 6,02%
- O = 5,88%
- L = 5,34%
- U = 5,08%
- S = 5,08%
- Á = 4,72%
- G = 3,91%
- N = 3,90%
- V = 3,77%
- R = 3,54%
- M = 3,40%
- H = 2,96%
- K = 2,60%
- J = 2,17%
- B = 1,61%
- Š = 1,29%
- P = 0,95%
- Đ = 0,80%
- Č = 0,66%
- F = 0,39%
- Ž = 0,25%
- C = 0,20%
- Z = 0,11%
- Ŋ = 0,10%
- Y = 0,07%
- Ŧ = 0,03%
- Ø = 0,03%
- Å = 0,02%
- W = 0,01%
- Æ = 0,01%
- Ä = 0,01%
- X = 0,00%
- Ö = 0,00%
- Q = 0,00%