North Sami Letter Frequency

The command was:

ccat -l sme -r freecorpus/converted/sme/ |tr '[a-zæøåöäáčđŋšŧž]' '[A-ZÆØÅÖÄÁČĐŊŠŦŽ]'| sed 's/\(.\)/\1 /g;'|tr ' ' '\n' | grep '[A-ZÆØÅÖÄÁČĐŊŠŦŽ]'|sort |uniq -c | sort -nr 

And the result, for N=46438888, was:

  • A = 12,98%
  • I = 9,22%
  • E = 6,81%
  • T = 6,08%
  • D = 6,02%
  • O = 5,88%
  • L = 5,34%
  • U = 5,08%
  • S = 5,08%
  • Á = 4,72%
  • G = 3,91%
  • N = 3,90%
  • V = 3,77%
  • R = 3,54%
  • M = 3,40%
  • H = 2,96%
  • K = 2,60%
  • J = 2,17%
  • B = 1,61%
  • Š = 1,29%
  • P = 0,95%
  • Đ = 0,80%
  • Č = 0,66%
  • F = 0,39%
  • Ž = 0,25%
  • C = 0,20%
  • Z = 0,11%
  • Ŋ = 0,10%
  • Y = 0,07%
  • Ŧ = 0,03%
  • Ø = 0,03%
  • Å = 0,02%
  • W = 0,01%
  • Æ = 0,01%
  • Ä = 0,01%
  • X = 0,00%
  • Ö = 0,00%
  • Q = 0,00%