Skip to content

Text caret position shifts when typing Japanese #447

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
processing-bot opened this issue Mar 12, 2022 · 10 comments
Closed

Text caret position shifts when typing Japanese #447

processing-bot opened this issue Mar 12, 2022 · 10 comments
Labels
has attachment Attachment was not transfered from GitLab help wanted Extra attention is needed

Comments

@processing-bot
Copy link
Collaborator

Created by: TN8001

This was translated by a machine.
Sorry for the delay in reporting this, as it stayed in beta1 due to #403 issues.

Description

Japanese characters are usually twice as wide as alphabetic characters. (Even in a monospace font!!!)
The caret position seems to be confused because all characters are determined to be the same width.

Many fonts are twice as wide (two alphabetic characters for one Japanese character), but some are not.
fontsize

Expected Behavior

There was no problem with beta4.

Processing.4.0b4.mp4

Current Behavior

The video is beta5, but the same applies to beta6 and beta7.

Processing.4.0b5.mp4

Your Environment

  • Processing version: 4.0b5 or later
  • Operating System and OS version: Windows10

settings

@processing-bot
Copy link
Collaborator Author

Created by: benfry

This is likely due to the fixes for #194, #226, #342 which were major problems that prevented people from working at all.

Do I understand correctly that Japanese works as long as you use a truly monospace font? Does it work correctly if you use the default monospaced font (meaning that you don't change anything in Preferences)? I'm just trying to understand the severity of the issue.

@processing-bot
Copy link
Collaborator Author

Created by: TN8001

Do I understand correctly that Japanese works as long as you use a truly monospace font?

There is no true monospace font (one alphabetic character for one Japanese character) in Japanese as far as I know.
This is because kanji have so many strokes that they cannot be expressed or read in the same width.
This is an exaggerated example (not a commonly used kanji), but kanji require a larger width than the alphabet.
Taito (kanji) - Uncyclopedia

Does it work correctly if you use the default monospaced font (meaning that you don't change anything in Preferences)?

To restore the default settings, do I just delete "C:\Users[UserName]\AppData\Roaming\Processing"?

Language: Japanese
Font: "Souce Code Pro" (not sure why this was chosen)

However, it does not do font fallback, so when I type Japanese, I get substitute characters (small rectangles).
To display Japanese, you must select a font that contains Japanese glyphs.

I'm just trying to understand the severity of the issue.

There is no problem if you do not enter Japanese, but you may want to enter comments or text strings.
It is still fine while typing, but very frustrating when trying to edit.

Chinese would be the same, and I am sure there are other languages that are affected (I am not familiar with the other languages).

@processing-bot
Copy link
Collaborator Author

Created by: benfry

I see… I wasn't aware that it wasn't a true monospace—I hadn't noticed that the larger/wider Latin characters used with the CJKV “monospace” fonts weren't the same width (they're very wide for Latin!) So we'll need to do some work to update the fixes for those other bugs (which finally got us reliable widths across devices, OS zoom, low-res/hidpi/retina) with a way to handle the changing widths. Thanks for the report.

@processing-bot
Copy link
Collaborator Author

Created by: Sardtok

The Chinese and Japanese characters themselves are monospaced, but the latin characters are typically "half-width". There are also "full-width" latin characters, but they have separate codepoints from the ASCII/ANSI characters. There are a few small "combining" characters in the syllabic alphabets. Not in the combining sense of Unicode, as they are separate characters, but they read as a single sound together with the character they modify. But they are still the same width, with a bit of extra whitespace on one side. One of the syllabic alphabets can also be input in half-width:

AAA
アアア
AAA
アアア

These are all A's, first full-width latin, full-width katakana, half-width (ASCII) latin and half-width katakana.

So, it's complicated. I think Japanese is probably the most complicated of the CJK languages (EDIT: layout-wise). Chinese uses only Chinese characters, Korean usually doesn't use Chinese characters much anymore and stick to Hangul a syllabic monospaced alphabet, and Japanese uses a mix of Chinese characters along with two syllabic alphabets and sometimes also the latin alphabet, but the syllabic alphabets and the Chinese characters are monospaced, except for the half-width katakana.

Some punctuation, and some small vs. large versions of characters where the small versions combine with other characters:

つつつ
っっっ
よよよ
ょょょ
。。。
、、、
???

I cannot speak for other Asian scripts like Thai, or for that matter the Arabic script families which use diacritics for vowels that modify the whole size of the consonant pattern characters, and writing direction gets really weird when mixing LTR and RTL in those scripts (that's as far as my knowledge of the Arabic languages stretches from my general linguistics courses from ages ago).

@processing-bot
Copy link
Collaborator Author

Created by: benfry

Understood; like I said, I didn't realize the already wider-than-normal Latin characters weren't actually full width.

Anyone want to help with a fix? It's a combination of the hard-coded with handling as a result of that the issue list I noted above, and probably an extra update or two inside processing.app.syntax.im package. As found in the resolution to those other issues, we can't trust what the OS tells us about the width of characters, so we'll need to use the same handling for CJK-style languages.

@processing-bot
Copy link
Collaborator Author

Created by: TN8001

Anyone want to help with a fix?

This is a serious issue for me and I would love to help.

However, I am not aware of any detailed change history to date.
I do not have a mac.
I have a single 2K (1920x1080) monitor, so I don't use HiDPI regularly.

It's a combination of the hard-coded with handling as a result of that the issue list I noted above, and probably an extra update or two inside processing.app.syntax.im package.

The cause itself is clear.
Since all characters are calculated as space-widths, full-width characters will be shifted.
processing4/JEditTextArea.java#L1418

This was changed to take the size of each character.

  static int getTabbedTextWidth(Segment s,
                                FontMetrics metrics, int x,
                                TabExpander e, int startOffset) {
    int nextX = x;
    char[] txt = s.array;
    int txtOffset = s.offset;
    int n = s.offset + s.count;

    for (int i = txtOffset; i < n; i++) {
      int charWidth = metrics.charWidth(txt[i]);
      if (txt[i] == '\t') {
        if (e != null) {
          nextX = (int) e.nextTabStop(nextX, startOffset + i - txtOffset);
        } else {
          nextX += charWidth;
        }
      } else if (txt[i] == '\n') {
      } else {
        nextX += charWidth;
      }
    }
    return nextX - x;
  }

Perhaps this is sufficient?

  static int getTabbedTextWidth(Segment s,
                                FontMetrics metrics, int x,
                                TabExpander e, int startOffset) {
    int nextX = x;
    char[] txt = s.array;
    int txtOffset = s.offset;
    int n = s.offset + s.count;

    for (int i = txtOffset; i < n; i++) {
      if (txt[i] == '\t' && e != null) {
        nextX = (int) e.nextTabStop(nextX, startOffset + i - txtOffset);
        continue;
      }
      nextX += metrics.charWidth(txt[i]);
    }
    return nextX - x;
  }

It works fine in my environment (Windows10 100% 125%), but
I am not certain if this fix is the right one.

@processing-bot
Copy link
Collaborator Author

Created by: Sardtok

I might be able to test your fix on various DPI Windows PCs and Macs later tonight.

@processing-bot
Copy link
Collaborator Author

Created by: Sardtok

Seems to work fine both on Windows 10 at 100%. 150% and 200% scaling, and on MacOS Monterey with the built-in Retina display (pixel density 2 in sketches).

No offsetting of the caret in either direction.

The only thing that was a bit weird on my Mac was full-width latin characters, which had no metric changes from ordinary latin characters. The glyph looked different, but there was no spacing around it. Could be something with the Osaka font on MacOS, but it's hard to say. This is not related to this patch, as it looks the same without it.

@TN8001 you should add a pull request to apply your patch.

@processing-bot
Copy link
Collaborator Author

Created by: TN8001

Thank you for confirming.

you should add a pull request to apply your patch.

I will be ready by the weekend.

@processing-bot
Copy link
Collaborator Author

Created by: github-actions[bot]

This issue has been automatically locked. To avoid confusion with reports that have already been resolved, closed issues are automatically locked 30 days after the last comment. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
has attachment Attachment was not transfered from GitLab help wanted Extra attention is needed
Projects
None yet
Development

No branches or pull requests

1 participant