Kotlin language specification

Several Kotlin features such as variable initialization analysis and smart casting analysis require performing control- and data-flow analyses. This section describes them and their applications.

We define all control-flow analyses for Kotlin on a classic model called a control-flow graph (CFG). A CFG of a program is a graph which loosely defines all feasible paths the flow of a particular program can take during execution. All CFGs given in this section are intraprocedural, meaning that they describe the flow inside a single function, not taking function calls into account. CFG may, however, include multiple function bodies if said functions are declared inside each other (as is the case for lambdas).

The following sections describe CFG fragments associated with a particular Kotlin code construct. These fragments are introduced using visual notation rather than relational notation to simplify the understanding of the graph structure. To represent intermediate values created during computation, we use implicit registers, denoted $1, $2, $3, etc. These are considered to be unique in each CFG fragment (assigning the same register twice in the same CFG may only occur in unrelated program paths) and in the complete CFG, too. The numbers given are only notational.

We introduce special eval nodes, represented in dashed lines, to connect CFG fragments into bigger fragments. eval x here means that this node must be replaced with the whole CFG fragment associated with x. When this replacement is performed, the value produced by eval is the same value that the meta-register $result holds in the corresponding fragment. All incoming edges of a fragment are connected to the incoming edges of the eval node, while all outgoing edges of a fragment are connected to the outgoing edges of the eval node. It is important, however, that, if such edges are absent either in the fragment or in the eval node, they (edges) are removed from the CFG.

We also use the eval b notation where b is not a single statement, but rather a control structure body. The fragment for a control structure body is the sequence of fragments for its statements, connected in the program order.

Some of the fragments have two kinds of outgoing edges, labeled t and f on the pictures. In a similar fashion, some eval nodes have two outgoing edges with the same labels. If such a fragment is inserted into such a node, only edges with matching labels are merged into each other. If either the fragment or the node have only unlabeled outgoing edges, the process is performed same as above.

For some types of analyses, it is important which boolean conditions hold on a control flow path. We use special assume nodes to introduce these conditions. assume x means that boolean condition x is always true when program flow passes through this particular node.

Some nodes are labeled, similarly to how statements may be labeled in Kotlin. Labeled nodes are considered CFG-unique and are handled as follows: if a fragment mentions a particular labeled node, this node is the same as any other node with this label in the complete CFG (i.e., a singular actual node is shared between all its labeled references). This is important when building graphs representing loops.

There are two other special kinds of nodes: unreachable nodes, signifying unreachable code, and backedge nodes, important for some kinds of analyses.

Simple expressions, like literals and references, do not affect the control-flow of the program in any way and are irrelevant w.r.t. CFG.

For every declaration and init block in a class body, the control flow is propagated through every element in the order of their appearance. Here we give a simplified example.

As discussed in the type system section of this specification, kotlin.Nothing is an uninhabited type, meaning an instance of this type can never exist at runtime. For the purposes of control-flow graph (and related analyses) this means, as soon as an expression is known statically to have kotlin.Nothing type, all subsequent code is unreachable.

The analyses defined in this document follow the pattern of analyses based on monotone frameworks, which work by modeling abstract program states as elements of lattices and joining these states using standard lattice operations. Such analyses may achieve limited path sensitivity via the analysis of conditions used in the assume nodes.

In short, an analysis is defined on the CFG by introducing:

The result of an analysis is a fixed point of the transfer function for each node of the given CFG, i.e., an abstract state for each node such that the transfer function maps the state to itself. For the particular shapes of the transfer function used in program analyses, given a finite $\mathbf{S}$ , the fixed point always exists, although the details of how this works go out of scope of this document.

Some analyses described further in this document are based on special instruction called $\operatorname{\mathit{killDataFlow}}(\upsilon)$ where $\upsilon$ is a program variable. These are not present in the graph representation described above and need to be inferred before such analyses may actually take place.

$\operatorname{\mathit{killDataFlow}}$ inference is based on a standard control-flow analysis with the lattice of natural numbers over “min” and “max” operations. That is, for every assignable property $x$ an element of this lattice is a natural number $N$ , with the least upper bound of two numbers defined as maximum function and the greatest lower bound as minimum function.

We assume the following transfer functions for our analysis.

$\begin{alignedat}{2} &\left[\!\left[\texttt{x = y} \right]\!\right](s) &&= s[x \rightarrow s(x) + 1] \\ \\ &\left[\!\left[\operatorname{\texttt{backedge}}\right]\!\right](s) &&= \{\star \rightarrow 0 \} \\ \\ &\left[\!\left[l \right]\!\right](s) &&= \bigsqcup_{p \in predecessor(l)} \left[\!\left[p \right]\!\right](s) \end{alignedat}$

After running this analysis, for every backedge $b$ and every variable $x$ present in $s$ , if $\exists b_p, b_s: b_p \in predecessors(b) \land b_s \in successors(b) \land \left[\!\left[b_p \right]\!\right](x) > \left[\!\left[b_s \right]\!\right](x)$ , a $\operatorname{\mathit{killDataFlow}}(x)$ instruction must be inserted after $b$ .

As an example, consider the following Kotlin code:

which results in the following CFG diagram (annotated with the analysis results where it is important):

There are two backedges: one for the inner loop (the inner backedge) and one for the outer loop (the outer backedge). The inner backedge has one predecessor with state $\{ \texttt{x} \rightarrow 2, \texttt{y} \rightarrow 2 \}$ and one successor with state $\{ \texttt{x} \rightarrow 1, \texttt{y} \rightarrow 2 \}$ with the value for x being less in the successor, meaning that we need to insert $\operatorname{\mathit{killDataFlow}}(\texttt{x})$ after the backedge. The outer backedge has one predecessor with state $\{ \texttt{x} \rightarrow 2, \texttt{y} \rightarrow 2 \}$ and one successor with state $\{ \texttt{x} \rightarrow 1, \texttt{y} \rightarrow 1 \}$ with values for both variables being less in the successor, meaning we need to insert $\operatorname{\mathit{killDataFlow}}(\texttt{x})$ and $\operatorname{\mathit{killDataFlow}}(\texttt{y})$ after the backedge.

Kotlin allows non-delegated properties to not have initializers in their declaration as long as the property is definitely assigned before its first usage. This property is checked by the variable initialization analysis (VIA). VIA operates on abstract values from the assignedness lattice, which is a flat lattice constructed over the set $\{\operatorname{\mathit{Assigned}}, \operatorname{\mathit{Unassigned}}\}$ . The analysis itself uses abstract values from a map lattice of all property declarations to their abstract states based on the assignedness lattice. The abstract states are propagated in a forward manner using the standard join operation to merge states from different paths.

The CFG nodes relevant to VIA include only property declarations and direct property assignments. Every property declaration adds itself to the domain by setting the $\operatorname{\mathit{Unassigned}}$ value to itself. Every direct property assignment changes the value for this property to $\operatorname{\mathit{Assigned}}$ .

The results of the analysis are interpreted as follows. For every property, any usage of the said property in any statement is a compile-time error unless the abstract state of this property at this statement is $\operatorname{\mathit{Assigned}}$ . For every read-only property (declared using val keyword), any assignment to this property is a compile-time error unless the abstract state of this property is $\operatorname{\mathit{Unassigned}}$ .

As an example, consider the following Kotlin code:

There are no incorrect operations in this example, so the code does not produce any compile-time errors.

Let us consider another example:

In this example, the state of both properties at line 3 is $\top$ , as it is the least upper bound of the states from lines 5 and 2 (from the while loop), which is derived to be $\top$ . This leads to a compile-time error at line 4 for x, because one cannot reassign a read-only property.

At line 7 there is another compile-time error when both properties are used, as there are paths in the CFG which reach line 7 when the properties have not been assigned (i.e., the case when the while loop body was skipped).

See the corresponding section for details.

Some standard-library functions in Kotlin are defined in such a way that they adhere to a specific call contract that affects the way calls to such functions are analyzed from the perspective of the caller’s control flow graph. A function’s call contract consists of one or more effects.

There are several kinds of effects:

Calls-in-place effect of function $F$ for a function-type parameter $P$ specifies that for every call of $F$ parameter $P$ will be also invoked as a function. This effect may also have one of the three invocation types:

These effects change the call graph that is produced for a function call of $F$ when supplied a lambda-expression parameter for $P$ . Without any effect, the graph looks like this:

For a function call

Please note that control flow information is passed inside the lambda body, but no information is extracted from it. If the corresponding parameter $P$ is introduced with exactly-once effect, this changes to:

If the corresponding parameter $P$ is introduced with at-least-once effect, this changes to:

If the corresponding parameter $P$ is introduced with at-most-once effect, this changes to:

This allows the control-flow information to be extracted from lambda expression according to the policy of its invocation.

Returns-implies-condition effect of function $F$ for a boolean parameter $P$ specifies that if, when invoked normally, a call to $F$ returns, $P$ is assumed to be true. For a function call

this changes normal call graph that looks like this:

to look like this:

Kotlin language specification

Control- and data-flow analysis

Control flow graph

Expressions

Function calls and operators

Conditional expressions

Boolean operators

Other expressions

Statements

Declarations

Examples

`kotlin.Nothing` and its influence on the CFG

Performing analyses on the control-flow graph

Types of lattices

Preliminary analysis and $\operatorname{\mathit{killDataFlow}}$ instruction

Variable initialization analysis

Smart casting analysis

Function contracts

References