Skip to content

Repo sync for protected branch #5003

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 8 commits into from
Mar 28, 2024
Original file line number Diff line number Diff line change
Expand Up @@ -8,11 +8,14 @@ helpviewer_keywords: ["MFC [C++], character set support", "MBCS [C++], strings a

Some languages, for example, Japanese and Chinese, have large character sets. To support programming for these markets, the Microsoft Foundation Class Library (MFC) enables two different approaches to handling large character sets:

- [Unicode](#mfc-support-for-unicode-strings), **`wchar_t`** based wide-characters and strings encoded as UTF-16.
- [Unicode](#mfc-support-for-unicode-strings), **`wchar_t`** based wide-characters, and strings encoded as UTF-16.

- [Multibyte Character Sets (MBCS)](#mfc-support-for-mbcs-strings), **`char`** based single or double-byte characters and strings encoded in a locale-specific character set.

Microsoft has recommended the MFC Unicode libraries for all new development, and the MBCS libraries were deprecated in Visual Studio 2013 and Visual Studio 2015. This is no longer the case. The MBCS deprecation warnings have been removed in Visual Studio 2017.
> [!NOTE]
> Microsoft recommends the MFC Unicode libraries for all new development.\
> The MBCS libraries were deprecated in Visual Studio 2013 and Visual Studio 2015. This is no longer the case.\
> Starting with Visual Studio 2017, the MBCS libraries are no longer deprecated and don't generate deprecation warnings.

## MFC Support for Unicode Strings

Expand Down Expand Up @@ -51,13 +54,13 @@ These library, debugger, and DLL files are used to support Unicode in MFC:

(*version* represents the version number of the file; for example, '140' means version 14.0.)

`CString` is based on the `TCHAR` data type. If the symbol `_UNICODE` is defined for a build of your program, `TCHAR` is defined as type **`wchar_t`**, a 16-bit character encoding type. Otherwise, `TCHAR` is defined as **`char`**, the normal 8-bit character encoding. Therefore, under Unicode, a `CString` is composed of 16-bit characters. Without Unicode, it is composed of characters of type **`char`**.
`CString` is based on the `TCHAR` data type. If the symbol `_UNICODE` is defined for a build of your program, `TCHAR` is defined as type **`wchar_t`**, a 16-bit character encoding type. Otherwise, `TCHAR` is defined as **`char`**, the normal 8-bit character encoding. Therefore, under Unicode, a `CString` is composed of 16-bit characters. Without Unicode, it's composed of characters of type **`char`**.

To complete Unicode programming of your application, you must also:

- Use the `_T` macro to conditionally code literal strings to be portable to Unicode.

- When you pass strings, pay attention to whether function arguments require a length in characters or a length in bytes. The difference is important if you are using Unicode strings.
- When you pass strings, pay attention to whether function arguments require a length in characters or a length in bytes. The difference is important if you're using Unicode strings.

- Use portable versions of the C run-time string-handling functions.

Expand All @@ -77,9 +80,9 @@ The [Run-Time Library Reference](../c-runtime-library/c-run-time-library-referen

The class library is also enabled for multibyte character sets, but only for double-byte character sets (DBCS).

In a multibyte character set, a character can be one or two bytes wide. If it is two bytes wide, its first byte is a special "lead byte" that is chosen from a particular range, depending on which code page is in use. Taken together, the lead and "trail bytes" specify a unique character encoding.
In a multibyte character set, a character can be one or 2 bytes wide. If it's 2 bytes wide, its first byte is a special "lead byte" that is chosen from a particular range, depending on which code page is in use. Taken together, the lead and "trail bytes" specify a unique character encoding.

If the symbol `_MBCS` is defined for a build of your program, type `TCHAR`, on which `CString` is based, maps to **`char`**. It is up to you to determine which bytes in a `CString` are lead bytes and which are trail bytes. The C run-time library supplies functions to help you determine this.
If the symbol `_MBCS` is defined for a build of your program, type `TCHAR`, on which `CString` is based, maps to **`char`**. It's up to you to determine which bytes in a `CString` are lead bytes and which are trail bytes. The C run-time library supplies functions to help you determine this.

Under DBCS, a given string can contain all single-byte ANSI characters, all double-byte characters, or a combination of the two. These possibilities require special care in parsing strings. This includes `CString` objects.

Expand Down