c++ - Utf-8 to URI percent encoding -
i'm trying convert unicode code points percent encoded utf-8 code units.
the unicode -> utf-8 conversion seems working correctly shown testing hindi , chinese characters show correctly in notepad++ utf-8 encoding, , can translated properly.
i thought percent encoding simple adding '%' in front of each utf-8 code unit, doesn't quite work. rather expected %e5%84%a3, i'm seeing %xe5%x84%xa3 (for unicode u+5123).

what doing wrong?
added code (note utf8.h belongs utf8-cpp library).
#include <fstream> #include <iostream> #include <vector> #include "utf8.h" std::string unicode_to_utf8_units(int32_t unicode) { unsigned char u[5] = {0,0,0,0,0}; unsigned char *iter = u, *limit = utf8::append(unicode, u); std::string s; (; iter != limit; ++iter) { s.push_back(*iter); } return s; } int main() { std::ofstream ofs("test.txt", std::ios_base::out); if (!ofs.good()) { std::cout << "ofstream encountered problem." << std::endl; return 1; } utf8::uint32_t unicode = 0x5123; auto s = unicode_to_utf8_units(unicode); (auto &c : s) { ofs << "%" << c; } ofs.close(); return 0; }
you need convert byte values corresponding ascii strings, example:
"é" in utf-8 value { 0xc3, 0xa9 }. please not these bytes, char values in c++.
each byte needs converted to: "%c3" , "%c9" respectively.
the best way use sstream:
std::ostringstream out; std::string utf8str = "\xe5\x84\xa3"; (int = 0; < utf8str.length(); ++i) { out << '%' << std::hex << std::uppercase << (int)(unsigned char)utf8str[i]; } or in c++11:
for (auto c: utf8str) { out << '%' << std::hex << std::uppercase << (int)(unsigned char)c; } please note bytes need cast int, because else << operator use litteral binary value. first casting unsigned char needed because otherwise, sign bit propagate int value, causing output of negative values ffffffe5.
Comments
Post a Comment