Skip to content

Cannot read Chinese characters with Json::CharReaderBuilder in Unicode character set #1134

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
longlei opened this issue Feb 1, 2020 · 8 comments

Comments

@longlei
Copy link

longlei commented Feb 1, 2020

Describe the bug
Platform: Win10 x64, VS2019, jsoncpp-master
Language: MFC

There some Chinese characters in the "setting.json" which is encoded using UTF-8. And my MFC project is using Unicode character set, When I use the CharReaderBuilder.parseFromStream() to read the "setting.json", all of the Chinese characters becomes garbled and all English characters are right.
So how should I read the Chinese json file correctly?

The code:

Json::CharReaderBuilder jsonBuilder;
Json::Value jsonRoot;

std::ifstream ifs;
ifs.open("D://Device.json", std::ios::binary);

jsonBuilder["collectComments"] = FALSE;
JSONCPP_STRING errs;
if (parseFromStream(jsonBuilder, ifs, &jsonRoot, &errs))
{
	UINT len = 0;
	len = jsonRoot.size();
}
for (UINT i = 0; i < jsonRoot.size(); i++)
{
	CString tmp;
	tmp = jsonRoot[i]["SoftwwareName"].asCString(); // tmp is garbled, not shows "测试用软件"
}

The setting.json:

[
    {
        "DeviceName":"设备名称",
        "SoftwwareName":"测试用软件",
        "IsUsed":"yes"
    }
]

Any idea?

@dota17
Copy link
Member

dota17 commented Feb 3, 2020

No probleams in my env.
I think the problem is that your MFC project display the result chars as Unicode chars although these chars are encoded as UTF-8.

In my env, when the console use other character set, it will be garbled.But when the console use UTF8, it will be OK.

@longlei
Copy link
Author

longlei commented Feb 3, 2020

No probleams in my env.
I think the problem is that your MFC project display the result chars as Unicode chars although these chars are encoded as UTF-8.

In my evn, when the console use other character set, it will be garbled.But when the console use UTF8, it will be OK.

Yes,my MFC project is encoded as Unicode. So it cannot show Chinese correctly.
I have to read the UTF-8-encoded file and transcode into Unicode string, then read from the Unicode string by jsoncpp. This method can work.
But if I want to read from the file directly by jsoncpp, how should I do?

@dota17
Copy link
Member

dota17 commented Feb 11, 2020

@longlei , sorry for the belated reply.
You can try the new feature - emitUTF8, which is merged from #1045 .
I guess it could help, but i didn't try.

I have to read the UTF-8-encoded file and transcode into Unicode string, then read from the Unicode string by jsoncpp. This method can work.
But if I want to read from the file directly by jsoncpp, how should I do?

Seems to be the only effective solution. I hava no idea.

@dota17
Copy link
Member

dota17 commented Feb 17, 2020

@longlei
Does it help you, or is there other problems?

@dota17
Copy link
Member

dota17 commented Feb 24, 2020

Closing due to inactivity. Feel free to reopen if this is still an issue.

@dota17 dota17 closed this as completed Feb 24, 2020
@yesiah
Copy link

yesiah commented Apr 18, 2020

@longlei Not sure if you still care. I encountered the same thing with Japanese characters. And it turns out that @dota17 was right.
The short answer is: the content of your tmp variable is correct, but not displayed properly because Visual Studio uses another character set, which has a different character mapping than UTF-8.

Try following the link to check if your tmp variable can be shown properly with ? tmp, s8
https://blogs.msmvps.com/gdicanio/2016/11/22/whats-wrong-with-my-utf-8-strings-in-visual-studio/

@longlei
Copy link
Author

longlei commented Apr 18, 2020

@longlei , sorry for the belated reply.
You can try the new feature - emitUTF8, which is merged from #1045 . I guess it could help, but i didn't try.

I have to read the UTF-8-encoded file and transcode into Unicode string, then read from the Unicode string by jsoncpp. This method can work.
But if I want to read from the file directly by jsoncpp, how should I do?

Seems to be the only effective solution. I hava no idea.

Thanks a lot. I have solved the problem with your help.

@longlei
Copy link
Author

longlei commented Apr 18, 2020

@longlei Not sure if you still care. I encountered the same thing with Japanese characters. And it turns out that @dota17 was right.
The short answer is: the content of your tmp variable is correct, but not displayed properly because Visual Studio uses another character set, which has a different character mapping than UTF-8.

Try following the link to check if your tmp variable can be shown properly with ? tmp, s8
https://blogs.msmvps.com/gdicanio/2016/11/22/whats-wrong-with-my-utf-8-strings-in-visual-studio/

You are right. Thanks.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

3 participants