Importing foreign languages from csv file to Stata -
i using stata 12. have encountered following problems. importing bunch of .csv files stata using insheet command. datasets may conclude russian, croatian, turkish, etc. think encoded in "utf-8". in .csv files, correct. after imported them stata, original strings incorrect , become strange characters. please me that? stat-transfer can solve problems? support .csv format?
for example, original file like: 
my code like: insheet using name.csv, c n save name.dta,replace
the result like: 
and have tried adjust script in fonts option, not work.
as @nick cox commented earlier, problem stata doesn't support unicode/utf-8 encoding. no, stattransfer wouldn't resolve problem (please refer this explanation).
you can trick using online decoder or ms word. let's 1 language first, say, russian in screenshots. check out correct encodings croatian, turkish, , other languages have.
- save string variable .csv file plain text (.txt), choosing utf-8 encoding option.
- encoding conversion:
- use iconv, suggested @dimitriy v. masterov, or
- use online tool, such this: upload .txt file, choose source encoding utf-8 , output encoding according language of interest (for russian, must cp1251), click "convert" button , save output file, or
- if have ms office, can use ms word same purpose. right click on .txt file, choose "open with...", choose open ms word. in appeared window, confirm file encoding "unicode (utf-8)", open, click "save as...", save plain text. in newly appeared window, choose "cyrillic (windows)" , mark "insert line breaks". save.
- check out new .txt file - still should have strange characters (like ÌßÑÎÊÎÌÁÈÍÀÒ) stata can display them properly.
- copy-paste new string variable in stata data editor, right click on variable, choose "font...", , string "cyrillic". should see correct names on screen both in data editor , in results window (even though string intact).

depending on os, might need install appropriate languages first.
hope helps.
Comments
Post a Comment