Commented Unassigned: extract_utf16string doesn't return the whole body [387]

```
void test_pageGetUTF8(std::wstring url){
http_client client(url);
auto query = uri_builder().append_query(L"q", L"test").to_string();
client.request(methods::GET, query)
.then([](http_response response)->pplx::task<std::string>
{
std::string rv;
auto status = response.status_code();
std::cout << "Status is: " << status << std::endl;
if (status == status_codes::OK)
{
response.content_ready().get();
size_t len = response.headers().content_length();
std::cout << "Length is: " << len << std::endl;
std::string data = response.extract_utf8string(false).get();
if (data.size() == len){
rv = "OK";//data;
}
else{
rv = "Failed to retrieve the whole body";
}
}
else{
rv = "error ";
rv += status;
}
return pplx::task_from_result(rv);
})
.then([](pplx::task<std::string> data){
std::cout << data.get() << std::endl;
}).wait();
}

void test_pageGetUTF16(std::wstring url){
http_client client(url);
auto query = uri_builder().append_query(L"q", L"test").to_string();
client.request(methods::GET, query)
.then([](http_response response)->pplx::task<utility::string_t>
{
utility::string_t rv;
auto status = response.status_code();
std::wcout << "Status is: " << status << std::endl;
if (status == status_codes::OK)
{
response.content_ready().get();
size_t len = response.headers().content_length();
std::cout << "Length is: " << len <<std::endl;
utility::string_t data = response.extract_utf16string(false).get();
if (data.size() == len){
rv = U("OK");//data;
}
else{
rv = U("Failed to retrieve the whole body");
}
}
else{
rv = U("error ");
rv += status;
}
return pplx::task_from_result(rv);
})
.then([](pplx::task<utility::string_t> data){
std::wcout << data.get() << std::endl <<std::endl;
}).wait();
}

int _tmain(int argc, _TCHAR* argv[])
{
test_pageGetUTF8(U("http://www.codeproject.com"));
test_pageGetUTF16(U("http://www.codeproject.com"));

test_pageGetUTF8(U("https://duckduckgo.com/"));
test_pageGetUTF16(U("https://duckduckgo.com/"));

return 0;
}
```

It works fine when tested with the second URL (both utf8 and utf16) but when tested with codeproject.com, the utf16 version fails to retrieve the whole body (usually around 20 chars shorter)
Comments: Hi mihai_qwi, A Unicode code point in the UTF-8 encoding doesn't necessarily consume the same number of bytes as the same code point in the UTF-16 encoding. std::string::size() returns the number of 1 bytes characters, std::wstring::size() returns the number of 2 byte characters. For example consider the Euro sign code point: U+20AC. In the UTF-8 encoding it is the following 3 bytes: 11100010 10000010 10101100 However in the UTF-16 encoding the Euro sign only takes only the following two bytes: 00100000 10101100 Take a look at Wikipedia [here](http://en.wikipedia.org/wiki/UTF-8). I ran your sample again and did notice the the std::wcout call doesn't include the closing </html> tag. I then took at look at the actual string being printed and noticed that the end of the string does indeed have all the data including </html>. If you replace the wcout call with a wprintf like the following I notice all the data is printed out as expected. ``` auto str = data.get(); wprintf(L"%s\n\n", str.c_str()); ``` I'm not sure what is going on hear but there appears to be some sort of issue with wcout printing out all the data. Because it is indeed in the string. Let me know if you have any other questions/issue. Steve

Latest Images

Trending Articles

Latest Images