View Issue Details
ID | Project | Category | View Status | Date Submitted | Last Update |
---|---|---|---|---|---|
0021101 | mantisbt | bugtracker | public | 2016-06-13 00:49 | 2017-10-26 07:04 |
Reporter | vboctor | Assigned To | dregad | ||
Priority | normal | Severity | minor | Reproducibility | always |
Status | closed | Resolution | fixed | ||
Product Version | 1.3.0-rc.2 | ||||
Target Version | 1.3.0 | Fixed in Version | 1.3.0 | ||
Summary | 0021101: Issues with emoji's are truncated before getting saved | ||||
Description | The following line is expected to be truncated when saved to the database and email is sent on the emoji after Z3 but there is text after that which I pasted in. Not compatible with my Xperia Z3 | ||||
Tags | mantishub | ||||
Attached Files | emoji_text.txt (97 bytes)
Not compatible with my Xperia Z3 😢😢 any help would be great as this game looks amazing 👍 | ||||
I added the text you sent by e-mail as an UTF-8 text file attachment for the record. Emoji are stored as 4-byte Unicode characters, so I would guess that the issue is a side effect of our using MySQL's 'UTF8' charset, which only supports 3 bytes chars. See 0020431 and more specifically my note 0020431:0052209: "1. [...] (eventually, someone will face issues as they try to store 4-byte unicode chars, e.g. emoji or some CJK characters)". |
|
I've tried this string with other services and some of them replace the emojis with ??? but don't truncate the text. Can we do a similar work around until the db support is done? |
|
Certainly. I believe the simplest would be to simply replace any UTF-8 char > U+10000 by a given character or string (I'd suggest we use U+FFFD - �). Question is, do we also need/want to somehow store the original character too ? e.g. for the crying face example you reported, we could replace by something like '�[U+1f622]'. I'm not sure it's worth the effort. That could make the display look bad if echoed as-is, especially if there are a lot of "invalid" characters (e.g. a sentence in Chinese) but on the other hand it would allow us to
This being a workaround, to minimize the impact on the code base, I would also limit applying this to key selected fields; I would say: bug summary, description, steps to reproduce, additional info and bugnote text. Let me know your thoughts. |
|
Proof-of-concept: see attached screenshot 'Selection_002.png' |
|
Looks good. I would go with the simple approach of replacing 4-byte unicode characters with �. Similar to what you have done in proof of concept. |
|
OK then. I'll submit a pull request after applying the workaround to the 3 bug fields. Will also need to check if this does not also cause issues in history and bug_revision tables. |
|
For the record, a couple helper functions I used while testing <pre> function utf8_ord( $p_char ) {
} |
|
MantisBT: master-1.3.x 805ef0cb 2016-06-18 12:42 Details Diff |
New database API function db_mysql_fix_utf8() This new function replaces 4-byte UTF-8 chars by Unicode U+FFFD character for MySQL databases. This is a temporary workaround to avoid data getting truncated on MySQL databases using native utf8 encoding which only supports 3 bytes chars, until we're able to support utf8mb4 charset (see issue 0020431). Fixes 0021101 |
Affected Issues 0020431, 0021101 |
|
mod - core/database_api.php | Diff File | ||
MantisBT: master-1.3.x 4dcb16cc 2016-06-18 12:48 Details Diff |
Fix 4-byte UTF-8 chars issues on MySQL This applies the new db_mysql_fix_utf8() function to the following fields: - bug.summary - bug.description - bug.steps_to_reproduce - bug.additional_information - bugnote.text - custom fields Fixes 0021101 |
Affected Issues 0021101 |
|
mod - core/bug_api.php | Diff File | ||
mod - core/bugnote_api.php | Diff File | ||
mod - core/cfdefs/cfdef_standard.php | Diff File | ||
mod - core/custom_field_api.php | Diff File |