c++ - Utf-8 to URI percent encoding -


i'm trying convert unicode code points percent encoded utf-8 code units.

the unicode -> utf-8 conversion seems working correctly shown testing hindi , chinese characters show correctly in notepad++ utf-8 encoding, , can translated properly.

i thought percent encoding simple adding '%' in front of each utf-8 code unit, doesn't quite work. rather expected %e5%84%a3, i'm seeing %xe5%x84%xa3 (for unicode u+5123).

enter image description here

what doing wrong?

added code (note utf8.h belongs utf8-cpp library).

#include <fstream> #include <iostream> #include <vector> #include "utf8.h"  std::string unicode_to_utf8_units(int32_t unicode) {     unsigned char u[5] = {0,0,0,0,0};     unsigned char *iter = u, *limit = utf8::append(unicode, u);     std::string s;     (; iter != limit; ++iter) {         s.push_back(*iter);     }     return s; }  int main() {     std::ofstream ofs("test.txt", std::ios_base::out);     if (!ofs.good()) {         std::cout << "ofstream encountered problem." << std::endl;         return 1;     }      utf8::uint32_t unicode = 0x5123;     auto s = unicode_to_utf8_units(unicode);     (auto &c : s) {         ofs << "%" << c;     }      ofs.close();      return 0; } 

you need convert byte values corresponding ascii strings, example:

"é" in utf-8 value { 0xc3, 0xa9 }. please not these bytes, char values in c++.

each byte needs converted to: "%c3" , "%c9" respectively.

the best way use sstream:

std::ostringstream out; std::string utf8str = "\xe5\x84\xa3";  (int = 0; < utf8str.length(); ++i) {     out << '%' << std::hex << std::uppercase << (int)(unsigned char)utf8str[i]; } 

or in c++11:

for (auto c: utf8str) {     out << '%' << std::hex << std::uppercase << (int)(unsigned char)c; } 

please note bytes need cast int, because else << operator use litteral binary value. first casting unsigned char needed because otherwise, sign bit propagate int value, causing output of negative values ffffffe5.


Comments

Popular posts from this blog

java.util.scanner - How to read and add only numbers to array from a text file -

rewrite - Trouble with Wordpress multiple custom querystrings -