kotlinx-io-core/kotlinx.io/writeCodePointValue

writeCodePointValue

fun Sink.writeCodePointValue(codePoint: Int)(source)

Encodes codePoint in UTF-8 and writes it to this sink.

codePoint should represent valid Unicode code point, meaning that its value should be within the Unicode codespace (U+000000 .. U+10ffff), otherwise IllegalArgumentException will be thrown.

Note that in general, a value retrieved from Char.code could not be written directly as it may be a part of a surrogate pair (that could be detected using Char.isSurrogate, or Char.isHighSurrogate and Char.isLowSurrogate). Such a pair of characters needs to be manually converted back to a single code point which then could be written to a Sink. Without such a conversion, data written to a Sink can not be converted back to a string from which a surrogate pair was retrieved.

More specifically, all code points mapping to UTF-16 surrogates (U+d800..U+dfff) will be written as ? characters (U+0063).

Parameters

codePoint

the codePoint to be written.

Throws

IllegalStateException

when the sink is closed.

IllegalArgumentException

when codePoint value is negative, or greater than U+10ffff.

IOException

when some I/O error occurs.

Samples

import kotlinx.io.*
import kotlin.test.*

fun main() { 
   //sampleStart 
   val buffer = Buffer()

// Basic Latin (a.k.a. ASCII) characters are encoded with a single byte
buffer.writeCodePointValue('Y'.code)
assertContentEquals(byteArrayOf(0x59), buffer.readByteArray())

// wider characters are encoded into multiple UTF-8 code units
buffer.writeCodePointValue('Δ'.code)
assertContentEquals(byteArrayOf(0xce.toByte(), 0x94.toByte()), buffer.readByteArray())

// note the difference: writeInt won't encode the code point, like writeCodePointValue did
buffer.writeInt('Δ'.code)
assertContentEquals(byteArrayOf(0, 0, 0x3, 0x94.toByte()), buffer.readByteArray()) 
   //sampleEnd
}

import kotlinx.io.*
import kotlin.test.*

fun main() { 
   //sampleStart 
   val buffer = Buffer()

// U+1F31E (a.k.a. "sun with face") is too wide to fit in a single UTF-16 character,
// so it's represented using a surrogate pair.
val chars = "🌞".toCharArray()
assertEquals(2, chars.size)

// such a pair has to be manually converted to a single code point
assertTrue(chars[0].isHighSurrogate())
assertTrue(chars[1].isLowSurrogate())

val highSurrogate = chars[0].code
val lowSurrogate = chars[1].code

// see https://en.wikipedia.org/wiki/UTF-16#Code_points_from_U+010000_to_U+10FFFF for details
val codePoint = 0x10000 + (highSurrogate - 0xD800).shl(10).or(lowSurrogate - 0xDC00)
assertEquals(0x1F31E, codePoint)

// now we can write the code point
buffer.writeCodePointValue(codePoint)
// and read the correct string back
assertEquals("🌞", buffer.readString())

// we won't achieve that by writing surrogates as it is
buffer.apply {
    writeCodePointValue(highSurrogate)
    writeCodePointValue(lowSurrogate)
}
assertNotEquals("🌞", buffer.readString()) 
   //sampleEnd
}