Skip to content

relying on default charset to be consistent across instances #336

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Closed
processing-bot opened this issue Jan 5, 2022 · 5 comments
Closed

Comments

@processing-bot
Copy link
Collaborator

Created by: ratchetfreak

https://github.com/processing/processing/blob/a6a86250ffa4041b7d77fbb66dc02e78360c64f0/java/libraries/net/src/processing/net/Client.java#L718-L720

java.lang.String.getBytes() relies on the default charset which is not necessarily consistent between instances. This should be fixed to a portable standard charset

  public String readString() {
    byte b[] = readBytes();
    if (b == null) return null;
    return new String(b, StandardCharset.UTF_8); // or whatever charset you want to fix it to
  }

  public void write(String data) {
    write(data.getBytes(StandardCharset.UTF_8)); // must match readString
  }
@processing-bot
Copy link
Collaborator Author

Created by: benfry

Hm, this is noted in the code:

  /**
   * <h3>Advanced</h3>
   * Write a String to the output. Note that this doesn't account
   * for Unicode (two bytes per char), nor will it send UTF8
   * characters.. It assumes that you mean to send a byte buffer
   * (most often the case for networking and serial i/o) and
   * will only use the bottom 8 bits of each char in the string.
   * (Meaning that internally it uses String.getBytes)
   *
   * If you want to move Unicode data, you can first convert the
   * String to a byte stream in the representation of your choice
   * (i.e. UTF8 or two-byte Unicode data), and send it as a byte array.
   */

and also applies to readString(). Though that was a decision made ~20 years ago and UTF-8 is waaaay more prominent now. And it does seem like consistent behavior is better than undefined behavior.

I'll sleep on it, but it seems like a good idea to change.

@processing-bot
Copy link
Collaborator Author

Created by: dzaima

Note that Java 18 defaults to UTF-8, so it's not that far away from when this won't be a problem anymore, and gives a good reason to change things in the meantime.

@processing-bot
Copy link
Collaborator Author

Created by: benfry

Good point, will go with that.

@processing-bot
Copy link
Collaborator Author

Created by: benfry

Changed for 4.0 beta 3 with aab8d8f

@processing-bot
Copy link
Collaborator Author

Created by: github-actions[bot]

This issue has been automatically locked. To avoid confusion with reports that have already been resolved, closed issues are automatically locked 30 days after the last comment. Please open a new issue for related bugs.

@github-actions github-actions bot locked as resolved and limited conversation to collaborators Sep 17, 2024
Sign up for free to subscribe to this conversation on GitHub. Already have an account? Sign in.
Labels
None yet
Projects
None yet
Development

No branches or pull requests

1 participant