Importing foreign languages from csv file to Stata -


i using stata 12. have encountered following problems. importing bunch of .csv files stata using insheet command. datasets may conclude russian, croatian, turkish, etc. think encoded in "utf-8". in .csv files, correct. after imported them stata, original strings incorrect , become strange characters. please me that? stat-transfer can solve problems? support .csv format?

for example, original file like: enter image description here

my code like: insheet using name.csv, c n save name.dta,replace

the result like: enter image description here

and have tried adjust script in fonts option, not work.

as @nick cox commented earlier, problem stata doesn't support unicode/utf-8 encoding. no, stattransfer wouldn't resolve problem (please refer this explanation).

you can trick using online decoder or ms word. let's 1 language first, say, russian in screenshots. check out correct encodings croatian, turkish, , other languages have.

  1. save string variable .csv file plain text (.txt), choosing utf-8 encoding option.
  2. encoding conversion:
    • use iconv, suggested @dimitriy v. masterov, or
    • use online tool, such this: upload .txt file, choose source encoding utf-8 , output encoding according language of interest (for russian, must cp1251), click "convert" button , save output file, or
    • if have ms office, can use ms word same purpose. right click on .txt file, choose "open with...", choose open ms word. in appeared window, confirm file encoding "unicode (utf-8)", open, click "save as...", save plain text. in newly appeared window, choose "cyrillic (windows)" , mark "insert line breaks". save.
  3. check out new .txt file - still should have strange characters (like ÌßÑÎÊÎÌÁÈÍÀÒ) stata can display them properly.
  4. copy-paste new string variable in stata data editor, right click on variable, choose "font...", , string "cyrillic". should see correct names on screen both in data editor , in results window (even though string intact).
    example of cp1251 encoding in stata

depending on os, might need install appropriate languages first.
hope helps.


Comments

Popular posts from this blog

c++ - CryptStringToBinary API behavior -

c++ - Correct method for redrawing a layered window -

java.util.scanner - How to read and add only numbers to array from a text file -