Under Linux where strings are narrow, json string parsing incorrectly handles \uXXXX escape sequences (ex: \u20AC = €).
In version 2.2.0, json_parsing.cpp line 724, the 16 bits char value is truncated to 8 bits, instead of being utf-8 encoded :
// Construct the character based on the decoded number
ch = static_cast<CharType>(decoded & 0xFFFF); //<-- wrong here, ch is a char
token.string_val.push_back(ch);
Proposed patch (can probably be optimized by manual utf-8 encoding):
#if defined _UTF16_STRINGS
// Construct the character based on the decoded number
ch = static_cast<CharType>(decoded & 0xFFFF);
token.string_val.push_back(ch);
#else
char16_t tmp[2] = { static_cast<char16_t>(decoded), 0 };
std::string decoded_utf8 = to_utf8string(tmp);
token.string_val.append(decoded_utf8);
#endif
A test should be added in parsing_tests.cpp: \u sequences are not tested with code points higher than 127.
str = json::value::parse(U("\"\\u20AC\""));
string_t euro = to_string_t("\xE2\x82\xAC");
VERIFY_ARE_EQUAL(euro, str.as_string());
In version 2.2.0, json_parsing.cpp line 724, the 16 bits char value is truncated to 8 bits, instead of being utf-8 encoded :
// Construct the character based on the decoded number
ch = static_cast<CharType>(decoded & 0xFFFF); //<-- wrong here, ch is a char
token.string_val.push_back(ch);
Proposed patch (can probably be optimized by manual utf-8 encoding):
#if defined _UTF16_STRINGS
// Construct the character based on the decoded number
ch = static_cast<CharType>(decoded & 0xFFFF);
token.string_val.push_back(ch);
#else
char16_t tmp[2] = { static_cast<char16_t>(decoded), 0 };
std::string decoded_utf8 = to_utf8string(tmp);
token.string_val.append(decoded_utf8);
#endif
A test should be added in parsing_tests.cpp: \u sequences are not tested with code points higher than 127.
str = json::value::parse(U("\"\\u20AC\""));
string_t euro = to_string_t("\xE2\x82\xAC");
VERIFY_ARE_EQUAL(euro, str.as_string());