From eb488bb4d9969eea53136515e4f90efb2785fdfc Mon Sep 17 00:00:00 2001 From: Niels Lohmann Date: Sun, 1 Aug 2021 20:54:02 +0200 Subject: [PATCH] :memo: add note for wstring handling --- README.md | 1 + doc/mkdocs/docs/home/faq.md | 41 ++++++++++++++++++++++++++++++++++++- 2 files changed, 41 insertions(+), 1 deletion(-) diff --git a/README.md b/README.md index 9f972c211..b717f9bee 100644 --- a/README.md +++ b/README.md @@ -1612,6 +1612,7 @@ The library supports **Unicode input** as follows: - Invalid surrogates (e.g., incomplete pairs such as `\uDEAD`) will yield parse errors. - The strings stored in the library are UTF-8 encoded. When using the default string type (`std::string`), note that its length/size functions return the number of stored bytes rather than the number of characters or glyphs. - When you store strings with different encodings in the library, calling [`dump()`](https://nlohmann.github.io/json/api/basic_json/dump/) may throw an exception unless `json::error_handler_t::replace` or `json::error_handler_t::ignore` are used as error handlers. +- To store wide strings (e.g., `std::wstring`), you need to convert them to a a UTF-8 encoded `std::string` before, see [an example](https://json.nlohmann.me/home/faq/#wide-string-handling). ### Comments in JSON diff --git a/doc/mkdocs/docs/home/faq.md b/doc/mkdocs/docs/home/faq.md index 23aa35a22..c6a283bc7 100644 --- a/doc/mkdocs/docs/home/faq.md +++ b/doc/mkdocs/docs/home/faq.md @@ -44,7 +44,7 @@ for objects. !!! question - - Can you add an option to ignore trailing commas? + Can you add an option to ignore trailing commas? This library does not support any feature which would jeopardize interoperability. @@ -70,6 +70,45 @@ The library supports **Unicode input** as follows: In most cases, the parser is right to complain, because the input is not UTF-8 encoded. This is especially true for Microsoft Windows where Latin-1 or ISO 8859-1 is often the standard encoding. +### Wide string handling + +!!! question + + Why are wide strings (e.g., `std::wstring`) dumped as arrays of numbers? + +As described [above](#parse-errors-reading-non-ascii-characters), the library assumes UTF-8 as encoding. To store a wide string, you need to change the encoding. + +!!! example + + ```cpp + #include // codecvt_utf8 + #include // wstring_convert + + // encoding function + std::string to_utf8(std::wstring& wide_string) + { + static std::wstring_convert> utf8_conv; + return utf8_conv.to_bytes(wide_string); + } + + json j; + std::wstring ws = L"車B1234 こんにちは"; + + j["original"] = ws; + j["encoded"] = to_utf8(ws); + + std::cout << j << std::endl; + ``` + + The result is: + + ```json + { + "encoded": "車B1234 こんにちは", + "original": [36554, 66, 49, 50, 51, 52, 32, 12371, 12435, 12395, 12385, 12399] + } + ``` + ## Exceptions ### Parsing without exceptions