c++ - Utf-8 to URI percent encoding -
i'm trying convert unicode code points percent encoded utf-8 code units.
the unicode -> utf-8 conversion seems working correctly shown testing hindi , chinese characters show correctly in notepad++ utf-8 encoding, , can translated properly.
i thought percent encoding simple adding '%' in front of each utf-8 code unit, doesn't quite work. rather expected %e5%84%a3, i'm seeing %xe5%x84%xa3 (for unicode u+5123).
what doing wrong?
added code (note utf8.h belongs utf8-cpp library).
#include <fstream> #include <iostream> #include <vector> #include "utf8.h" std::string unicode_to_utf8_units(int32_t unicode) { unsigned char u[5] = {0,0,0,0,0}; unsigned char *iter = u, *limit = utf8::append(unicode, u); std::string s; (; iter != limit; ++iter) { s.push_back(*iter); } return s; } int main() { std::ofstream ofs("test.txt", std::ios_base::out); if (!ofs.good()) { std::cout << "ofstream encountered problem." << std::endl; return 1; } utf8::uint32_t unicode = 0x5123; auto s = unicode_to_utf8_units(unicode); (auto &c : s) { ofs << "%" << c; } ofs.close(); return 0; }
you need convert byte values corresponding ascii strings, example:
"é"
in utf-8 value { 0xc3, 0xa9 }
. please not these bytes, char
values in c++.
each byte needs converted to: "%c3"
, "%c9"
respectively.
the best way use sstream:
std::ostringstream out; std::string utf8str = "\xe5\x84\xa3"; (int = 0; < utf8str.length(); ++i) { out << '%' << std::hex << std::uppercase << (int)(unsigned char)utf8str[i]; }
or in c++11:
for (auto c: utf8str) { out << '%' << std::hex << std::uppercase << (int)(unsigned char)c; }
please note bytes need cast int
, because else <<
operator use litteral binary value. first casting unsigned char
needed because otherwise, sign bit propagate int
value, causing output of negative values ffffffe5
.
Comments
Post a Comment