Kotlin Download PDF
Table of contents

Kotlin language specification

Version 1.9-rfc+0.1

Marat Akhin

Mikhail Belyaev

Control- and data-flow analysis

Several Kotlin features such as variable initialization analysis and smart casting analysis require performing control- and data-flow analyses. This section describes them and their applications.

Control flow graph

We define all control-flow analyses for Kotlin on a classic model called a control-flow graph (CFG). A CFG of a program is a graph which loosely defines all feasible paths the flow of a particular program can take during execution. All CFGs given in this section are intraprocedural, meaning that they describe the flow inside a single function, not taking function calls into account. CFG may, however, include multiple function bodies if said functions are declared inside each other (as is the case for lambdas).

The following sections describe CFG fragments associated with a particular Kotlin code construct. These fragments are introduced using visual notation rather than relational notation to simplify the understanding of the graph structure. To represent intermediate values created during computation, we use implicit registers, denoted $1, $2, $3, etc. These are considered to be unique in each CFG fragment (assigning the same register twice in the same CFG may only occur in unrelated program paths) and in the complete CFG, too. The numbers given are only notational.

We introduce special eval nodes, represented in dashed lines, to connect CFG fragments into bigger fragments. eval x here means that this node must be replaced with the whole CFG fragment associated with x. When this replacement is performed, the value produced by eval is the same value that the meta-register $result holds in the corresponding fragment. All incoming edges of a fragment are connected to the incoming edges of the eval node, while all outgoing edges of a fragment are connected to the outgoing edges of the eval node. It is important, however, that, if such edges are absent either in the fragment or in the eval node, they (edges) are removed from the CFG.

We also use the eval b notation where b is not a single statement, but rather a control structure body. The fragment for a control structure body is the sequence of fragments for its statements, connected in the program order.

Some of the fragments have two kinds of outgoing edges, labeled t and f on the pictures. In a similar fashion, some eval nodes have two outgoing edges with the same labels. If such a fragment is inserted into such a node, only edges with matching labels are merged into each other. If either the fragment or the node have only unlabeled outgoing edges, the process is performed same as above.

For some types of analyses, it is important which boolean conditions hold on a control flow path. We use special assume nodes to introduce these conditions. assume x means that boolean condition x is always true when program flow passes through this particular node.

Some nodes are labeled, similarly to how statements may be labeled in Kotlin. Labeled nodes are considered CFG-unique and are handled as follows: if a fragment mentions a particular labeled node, this node is the same as any other node with this label in the complete CFG (i.e., a singular actual node is shared between all its labeled references). This is important when building graphs representing loops.

There are two other special kinds of nodes: unreachable nodes, signifying unreachable code, and backedge nodes, important for some kinds of analyses.

Expressions

Simple expressions, like literals and references, do not affect the control-flow of the program in any way and are irrelevant w.r.t. CFG.

Function calls and operators

Note: we do not consider operator calls as being different from function calls, as they are just special types of function calls. Henceforth, they are not treated separately.

x.f(arg1,..., argN)
f(arg1,..., argN)
Conditional expressions

Note: to simplify the notation, we consider only if-expressions with both branches present. Any if-statement in Kotlin may be trivially turned into such an expression by replacing the missing else branch with a kotlin.Unit object expression.

if(c) tt else ff
when { 
    c1 -> b1
    else -> bE
}

Important: we only consider when expressions having exactly two branches for simplicity. A when expression with more than two branches may be trivially desugared into a series of nested when expression as follows:

when { 
    <entry1>
    <entries...>
    else -> bE
}

is the same as

when { 
    <entry1>
    else -> {
        when {
            <entries...>
            else -> bE
        }
    }
}
Boolean operators
!x
x || y
x && y
Other expressions
x ?: y
x?.y
try { a... }
catch (e1: T1) { b1... }
...
catch (eN: TN) { bN... }
finally { c... }

Important: in this diagram we consider finally block twice. The (1) block is used when handling the finally block and its body. The (2) block is used when considering the finally block w.r.t. rest of the CFG.

a!!
a as T
a as? T
{ a: T ... -> body... }
return
return@label
return a
return@label a
throw a
break@loop
continue@loop

Statements

Note: to simplify the notation, we consider only labeled loops, as unlabeled loops may be trivially turned into labeled ones by assigning them a unique label.

loop@ while(c) { b... }
loop@ do { b... } while(c)

Declarations

var a = b
var a by b
val a = b
val a by b
fun f() { body... }
class A (...) { 
    'declaration 1'
    'declaration 2'
    'init-block 1'
    'declaration 3'
    'init-block 2'
    ...
}

For every declaration and init block in a class body, the control flow is propagated through every element in the order of their appearance. Here we give a simplified example.

Examples

fun f() = listOf(1, 2).map { it + 2 }.filter { it > 0 }
fun f(x: Int) {
    var y = x
    loop@ while(y != 500) {
        y++
        if(y % 20 == 3) break@loop
    }
}

kotlin.Nothing and its influence on the CFG

As discussed in the type system section of this specification, kotlin.Nothing is an uninhabited type, meaning an instance of this type can never exist at runtime. For the purposes of control-flow graph (and related analyses) this means, as soon as an expression is known statically to have kotlin.Nothing type, all subsequent code is unreachable.

Important: each specific analysis may decide to either use this information or ignore it for a given program. If unreachability from kotlin.Nothing is used, it can be represented in different ways, e.g., by changing the CFG structure or via killDataFlow\operatorname{\mathit{killDataFlow}} instructions.

Performing analyses on the control-flow graph

The analyses defined in this document follow the pattern of analyses based on monotone frameworks, which work by modeling abstract program states as elements of lattices and joining these states using standard lattice operations. Such analyses may achieve limited path sensitivity via the analysis of conditions used in the assume nodes.

In short, an analysis is defined on the CFG by introducing:

The result of an analysis is a fixed point of the transfer function for each node of the given CFG, i.e., an abstract state for each node such that the transfer function maps the state to itself. For the particular shapes of the transfer function used in program analyses, given a finite S\mathbf{S} , the fixed point always exists, although the details of how this works go out of scope of this document.

Types of lattices

Preliminary analysis and killDataFlow\operatorname{\mathit{killDataFlow}} instruction

Some analyses described further in this document are based on special instruction called killDataFlow(υ)\operatorname{\mathit{killDataFlow}}(\upsilon) where υ\upsilon is a program variable. These are not present in the graph representation described above and need to be inferred before such analyses may actually take place.

killDataFlow\operatorname{\mathit{killDataFlow}} inference is based on a standard control-flow analysis with the lattice of natural numbers over “min” and “max” operations. That is, for every assignable property xx an element of this lattice is a natural number NN , with the least upper bound of two numbers defined as maximum function and the greatest lower bound as minimum function.

Note: such lattice has 0 as its bottom element and does not have a top element.

We assume the following transfer functions for our analysis.

[ ⁣[x = y] ⁣](s)=s[xs(x)+1][ ⁣[backedge] ⁣](s)={0}[ ⁣[l] ⁣](s)=ppredecessor(l)[ ⁣[p] ⁣](s) \begin{alignedat}{2} &\left[\!\left[\texttt{x = y} \right]\!\right](s) &&= s[x \rightarrow s(x) + 1] \\ \\ &\left[\!\left[\operatorname{\texttt{backedge}}\right]\!\right](s) &&= \{\star \rightarrow 0 \} \\ \\ &\left[\!\left[l \right]\!\right](s) &&= \bigsqcup_{p \in predecessor(l)} \left[\!\left[p \right]\!\right](s) \end{alignedat}

After running this analysis, for every backedge bb and every variable xx present in ss , if bp,bs:bppredecessors(b)bssuccessors(b)[ ⁣[bp] ⁣](x)>[ ⁣[bs] ⁣](x)\exists b_p, b_s: b_p \in predecessors(b) \land b_s \in successors(b) \land \left[\!\left[b_p \right]\!\right](x) > \left[\!\left[b_s \right]\!\right](x) , a killDataFlow(x)\operatorname{\mathit{killDataFlow}}(x) instruction must be inserted after bb .

Informally: this somewhat complicated condition matches variables which have been assigned to in the loop body w.r.t. this loop’s backedge.

Note: this analysis does involve a possibly infinite lattice (a lattice of natural numbers) and may seem to diverge on some graphs. However, if we assume that every backedge in an arbitrary CFG is marked with a backedge instruction, it is trivial to prove that no number in the lattice will ever exceed the number of assignments (which is finite) in the analyzed program as any loop in the graph will contain at least one backedge.

As an example, consider the following Kotlin code:

var x: Int = 0
var y: Int = 0
while (b1) {
    y = f()
    do {
        x = g()
    } while (b2)
}

which results in the following CFG diagram (annotated with the analysis results where it is important):

There are two backedges: one for the inner loop (the inner backedge) and one for the outer loop (the outer backedge). The inner backedge has one predecessor with state {x2,y2}\{ \texttt{x} \rightarrow 2, \texttt{y} \rightarrow 2 \} and one successor with state {x1,y2}\{ \texttt{x} \rightarrow 1, \texttt{y} \rightarrow 2 \} with the value for x being less in the successor, meaning that we need to insert killDataFlow(x)\operatorname{\mathit{killDataFlow}}(\texttt{x}) after the backedge. The outer backedge has one predecessor with state {x2,y2}\{ \texttt{x} \rightarrow 2, \texttt{y} \rightarrow 2 \} and one successor with state {x1,y1}\{ \texttt{x} \rightarrow 1, \texttt{y} \rightarrow 1 \} with values for both variables being less in the successor, meaning we need to insert killDataFlow(x)\operatorname{\mathit{killDataFlow}}(\texttt{x}) and killDataFlow(y)\operatorname{\mathit{killDataFlow}}(\texttt{y}) after the backedge.

Variable initialization analysis

Kotlin allows non-delegated properties to not have initializers in their declaration as long as the property is definitely assigned before its first usage. This property is checked by the variable initialization analysis (VIA). VIA operates on abstract values from the assignedness lattice, which is a flat lattice constructed over the set {Assigned,Unassigned}\{\operatorname{\mathit{Assigned}}, \operatorname{\mathit{Unassigned}}\} . The analysis itself uses abstract values from a map lattice of all property declarations to their abstract states based on the assignedness lattice. The abstract states are propagated in a forward manner using the standard join operation to merge states from different paths.

The CFG nodes relevant to VIA include only property declarations and direct property assignments. Every property declaration adds itself to the domain by setting the Unassigned\operatorname{\mathit{Unassigned}} value to itself. Every direct property assignment changes the value for this property to Assigned\operatorname{\mathit{Assigned}} .

The results of the analysis are interpreted as follows. For every property, any usage of the said property in any statement is a compile-time error unless the abstract state of this property at this statement is Assigned\operatorname{\mathit{Assigned}} . For every read-only property (declared using val keyword), any assignment to this property is a compile-time error unless the abstract state of this property is Unassigned\operatorname{\mathit{Unassigned}} .

As an example, consider the following Kotlin code:

/* 1 */ val x: Int //{xUnassigned,}\{ \mathtt{x} \rightarrow \mathit{Unassigned}, \star \rightarrow \bot \}
/* 2 */ var y: Int //{xUnassigned,yUnassigned,}\{ \mathtt{x} \rightarrow \mathit{Unassigned}, \mathtt{y} \rightarrow \mathit{Unassigned}, \star \rightarrow \bot \}
/* 3 */ if (c) { //
/* 4 */ x = 40 //{xAssigned,yUnassigned,}\{ \mathtt{x} \rightarrow \mathit{Assigned}, \mathtt{y} \rightarrow \mathit{Unassigned}, \star \rightarrow \bot \}
/* 5 */ y = 4 //{xAssigned,yAssigned,}\{ \mathtt{x} \rightarrow \mathit{Assigned}, \mathtt{y} \rightarrow \mathit{Assigned}, \star \rightarrow \bot \}
/* 6 */ } else { //
/* 7 */ x = 20 //{xAssigned,yUnassigned,}\{ \mathtt{x} \rightarrow \mathit{Assigned}, \mathtt{y} \rightarrow \mathit{Unassigned}, \star \rightarrow \bot \}
/* 8 */ } //{xAssigned,y,}\{ \mathtt{x} \rightarrow \mathit{Assigned}, \mathtt{y} \rightarrow \top, \star \rightarrow \bot \}
/* 9 */ y = 5 //{xAssigned,yAssigned,}\{ \mathtt{x} \rightarrow \mathit{Assigned}, \mathtt{y} \rightarrow \mathit{Assigned}, \star \rightarrow \bot \}
/* 10 */ val z = x + y //{xAssigned,yAssigned,zAssigned}\{ \mathtt{x} \rightarrow \mathit{Assigned}, \mathtt{y} \rightarrow \mathit{Assigned}, \mathtt{z} \rightarrow \mathit{Assigned}\}

There are no incorrect operations in this example, so the code does not produce any compile-time errors.

Let us consider another example:

/* 1 */ val x: Int //{xUnassigned,}\{ \mathtt{x} \rightarrow \mathit{Unassigned}, \star \rightarrow \bot \}
/* 2 */ var y: Int //{xUnassigned,yUnassigned,}\{ \mathtt{x} \rightarrow \mathit{Unassigned}, \mathtt{y} \rightarrow \mathit{Unassigned}, \star \rightarrow \bot \}
/* 3 */ while (c) { //{x,y,}\{ \mathtt{x} \rightarrow \top, \mathtt{y} \rightarrow \top, \star \rightarrow \bot \} Error!
/* 4 */ x = 40 //{x,y,}\{ \mathtt{x} \rightarrow \top, \mathtt{y} \rightarrow \top, \star \rightarrow \bot \}
/* 5 */ y = 4 //{x,y,}\{ \mathtt{x} \rightarrow \top, \mathtt{y} \rightarrow \top, \star \rightarrow \bot \}
/* 6 */ } //
/* 7 */ val z = x + y //{x,y,}\{ \mathtt{x} \rightarrow \top, \mathtt{y} \rightarrow \top, \star \rightarrow \bot \} More errors!

In this example, the state of both properties at line 3 is \top , as it is the least upper bound of the states from lines 5 and 2 (from the while loop), which is derived to be \top . This leads to a compile-time error at line 4 for x, because one cannot reassign a read-only property.

At line 7 there is another compile-time error when both properties are used, as there are paths in the CFG which reach line 7 when the properties have not been assigned (i.e., the case when the while loop body was skipped).

Smart casting analysis

See the corresponding section for details.

Function contracts

Note: as of Kotlin 1.9\textrm{1.9}{} , contracts for user-defined functions are an experimental feature and, thus, not described here

Some standard-library functions in Kotlin are defined in such a way that they adhere to a specific call contract that affects the way calls to such functions are analyzed from the perspective of the caller’s control flow graph. A function’s call contract consists of one or more effects.

There are several kinds of effects:

Calls-in-place effect of function FF for a function-type parameter PP specifies that for every call of FF parameter PP will be also invoked as a function. This effect may also have one of the three invocation types:

These effects change the call graph that is produced for a function call of FF when supplied a lambda-expression parameter for PP . Without any effect, the graph looks like this:

For a function call

f(..., { lambda-body... }, ...)

Please note that control flow information is passed inside the lambda body, but no information is extracted from it. If the corresponding parameter PP is introduced with exactly-once effect, this changes to:

If the corresponding parameter PP is introduced with at-least-once effect, this changes to:

If the corresponding parameter PP is introduced with at-most-once effect, this changes to:

This allows the control-flow information to be extracted from lambda expression according to the policy of its invocation.

Returns-implies-condition effect of function FF for a boolean parameter PP specifies that if, when invoked normally, a call to FF returns, PP is assumed to be true. For a function call

f(..., p, ...)

this changes normal call graph that looks like this:

to look like this:

The following standard library functions have contracts with the following effects:

Examples:

This code would result in a initialized variable analysis violation if run was not a standard function with corresponding contract:

val x: Int
run { // run invokes its argument exactly once
    x = 4
}
// could be error: x is not initialized
// but is ok
println(x) 

Several examples of contract-introduced smart-cast:

val x: Any = ...
check(x is Int)
// x is known to be Int thanks to assume introduced by 
// the contract of check
val y = x + 4 // would be illegal without contract
val x: Int? = ...
// x is known to be non-null thanks to assume introduced by 
// the contract of require
require(x != null)
val y = x + 4 // would be illegal without contract

References

  1. Frances E. Allen. “Control flow analysis.” ACM SIGPLAN Notices, 1970.
  2. Flemming Nielson, Hanne R. Nielson, and Chris Hankin. “Principles of program analysis.” Springer, 2015.
  3. Kam, John B., and Jeffrey D. Ullman. “Monotone data flow analysis frameworks.” Acta informatica 7.3 (1977): 305-317.
  4. Anders Moller, and Michael I. Schwartzbach. “Static Program Analysis.” 2018 ()