readCodePointValue
Decodes a single code point value from UTF-8 code units, reading between 1 and 4 bytes as necessary.
If this source is exhausted before a complete code point can be read, this throws an EOFException and consumes no input.
If this source starts with an ill-formed UTF-8 code units sequence, this method will remove 1 or more non-UTF-8 bytes and return the replacement character (U+fffd
).
The replacement character (U+fffd
) will be also returned if the source starts with a well-formed code units sequences, but a decoded value does not pass further validation, such as the value is out of range (beyond the 0x10ffff
limit of Unicode), maps to UTF-16 surrogates (U+d800
..U+dfff
), or an overlong encoding is detected (such as 0xc080
for the NUL character in modified UTF-8).
Note that in general, returned value may not be directly converted to Char as it may be out of Char's values range and should be manually converted to a surrogate pair.
Throws
when the source is exhausted before a complete code point can be read.
when the source is closed.
when some I/O error occurs.
Samples
import kotlinx.io.*
import kotlin.test.*
fun main() {
//sampleStart
val buffer = Buffer()
buffer.writeUShort(0xce94U)
assertEquals(0x394, buffer.readCodePointValue()) // decodes a single UTF-8 encoded code point
//sampleEnd
}
import kotlinx.io.*
import kotlin.test.*
fun main() {
//sampleStart
val buffer = Buffer()
// that's a U+1F31A, a.k.a. "new moon with face"
buffer.writeString("🌚")
// it should be encoded with 4 code units
assertEquals(4, buffer.size)
// let's read it back as a single code point
val moonCodePoint = buffer.readCodePointValue()
// all code units were consumed
assertEquals(0, buffer.size)
// the moon is too wide to fit in a single UTF-16 character!
assertNotEquals(moonCodePoint, moonCodePoint.toChar().code)
// "too wide" means in the [U+010000, U+10FFFF] range
assertTrue(moonCodePoint in 0x10000..0x10FFFF)
// See https://en.wikipedia.org/wiki/UTF-16#Code_points_from_U+010000_to_U+10FFFF for details
val highSurrogate = (0xD800 + (moonCodePoint - 0x10000).ushr(10)).toChar()
val lowSurrogate = (0xDC00 + (moonCodePoint - 0x10000).and(0x3FF)).toChar()
assertContentEquals(charArrayOf(highSurrogate, lowSurrogate), "🌚".toCharArray())
//sampleEnd
}