Type inference

Kotlin has a concept of type inference for compile-time type information, meaning some type information in the code may be omitted, to be inferred by the compiler. There are two kinds of type inference supported by Kotlin.

Type inference is a type constraint problem, and is usually solved by a type constraint solver. For this reason, type inference is applicable in situations when the type context contains enough information for the type constraint solver to create an optimal constraint system solution w.r.t. type inference problem.

Kotlin also supports flow-sensitive types in the form of smart casts, which have direct effect on type inference. Therefore, we will discuss them first, before talking about type inference itself.

Kotlin introduces a limited form of flow-sensitive typing called smart casts. Flow-sensitive typing means some expressions in the program may introduce changes to the compile-time types of variables. This allows one to avoid unneeded explicit casting of values in cases when their runtime types are guaranteed to conform to the expected compile-time types.

Flow-sensitive typing may be considered a specific instance of traditional data-flow analysis. Therefore, before we discuss it further, we need to establish the data-flow framework, which we will use for smart casts.

We assume our data-flow analysis is run on a classic control-flow graph (CFG) structure, where most non-trivial expressions and statements are simplified and/or desugared.

Our data-flow domain is a map lattice $\operatorname{\text{SmartCastData}}= \operatorname{\text{Expression}}\rightarrow \operatorname{\text{SmartCastType}}$ , where $\operatorname{\text{Expression}}$ is any Kotlin expression and $\operatorname{\text{SmartCastType}}= \operatorname{\text{Type}}\times \operatorname{\text{Type}}$ sublattice is a product lattice of smart cast data-flow facts of the following kind.

The sublattice order, join and meet are defined as follows.

$P_1 \times N_1 \sqsubseteq P_2 \times N_2 \Leftrightarrow P_1 <: P_2 \land N_1 :> N_2$

$\begin{aligned} P_1 \times N_1 \sqcup P_2 \times N_2 &= \operatorname{\text{LUB}}(P_1, P_2) \times \operatorname{\text{GLB}}(N_1, N_2) \\ P_1 \times N_1 \sqcap P_2 \times N_2 &= \operatorname{\text{GLB}}(P_1, P_2) \times \operatorname{\text{LUB}}(N_1, N_2) \end{aligned}$

The data-flow information uses the following transfer functions.

$\begin{alignedat}{3} &\left[\!\left[\operatorname{\mathit{assume}}(x \operatorname{\texttt{is}}T) \right]\!\right](s) &&= s[&&x \rightarrow s(x) \sqcap(T \times \top)] \\ &\left[\!\left[\operatorname{\mathit{assume}}(x \operatorname{\texttt{!is}}T) \right]\!\right](s) &&= s[&&x \rightarrow s(x) \sqcap(\top \times T)] \\ \\ &\left[\!\left[x \operatorname{\texttt{as}}T \right]\!\right](s) &&= s[&&x \rightarrow s(x) \sqcap(T \times \top)] \\ &\left[\!\left[x \operatorname{\texttt{!as}}T) \right]\!\right](s) &&= s[&&x \rightarrow s(x) \sqcap(\top \times T)] \\ \\ &\left[\!\left[\operatorname{\mathit{assume}}(x \operatorname{\texttt{==}}null) \right]\!\right](s) &&= s[&&x \rightarrow s(x) \sqcap(\operatorname{\texttt{kotlin.Nothing?}}\times \top)] \\ &\left[\!\left[\operatorname{\mathit{assume}}(x \operatorname{\texttt{!\!\!=}}null) \right]\!\right](s) &&= s[&&x \rightarrow s(x) \sqcap(\top \times \operatorname{\texttt{kotlin.Nothing?}})] \\ \\ &\left[\!\left[\operatorname{\mathit{assume}}(x \operatorname{\texttt{===}}null) \right]\!\right](s) &&= s[&&x \rightarrow s(x) \sqcap(\operatorname{\texttt{kotlin.Nothing?}}\times \top)] \\ &\left[\!\left[\operatorname{\mathit{assume}}(x \operatorname{\texttt{!\!\!==}}null) \right]\!\right](s) &&= s[&&x \rightarrow s(x) \sqcap(\top \times \operatorname{\texttt{kotlin.Nothing?}})] \\ \\ &\left[\!\left[\operatorname{\mathit{assume}}(x \operatorname{\texttt{==}}y) \right]\!\right](s) &&= s[&&x \rightarrow s(x) \sqcap s(y), \\ & && &&y \rightarrow s(x) \sqcap s(y)] \\ &\left[\!\left[\operatorname{\mathit{assume}}(x \operatorname{\texttt{!\!\!=}}y) \right]\!\right](s) &&= s[&&x \rightarrow s(x) \sqcap\operatorname{\mathit{swap}}(\operatorname{\mathit{isNullable}}(s(y))), \\ & && &&y \rightarrow s(y) \sqcap\operatorname{\mathit{swap}}(\operatorname{\mathit{isNullable}}(s(x)))] \\ \\ &\left[\!\left[\operatorname{\mathit{assume}}(x \operatorname{\texttt{===}}y) \right]\!\right](s) &&= s[&&x \rightarrow s(x) \sqcap s(y), \\ & && &&y \rightarrow s(x) \sqcap s(y)] \\ &\left[\!\left[\operatorname{\mathit{assume}}(x \operatorname{\texttt{!\!\!==}}y) \right]\!\right](s) &&= s[&&x \rightarrow s(x) \sqcap\operatorname{\mathit{swap}}(\operatorname{\mathit{isNullable}}(s(y))), \\ & && &&y \rightarrow s(y) \sqcap\operatorname{\mathit{swap}}(\operatorname{\mathit{isNullable}}(s(x)))] \\ \\ &\left[\!\left[x = y \right]\!\right](s) &&= s[&&x \rightarrow s(y)] \\ \\ &\left[\!\left[\operatorname{\mathit{killDataFlow}}(x) \right]\!\right](s) &&= s[&&x \rightarrow (\top \times \top)] \\ \\ &\left[\!\left[l \right]\!\right](s) &&= &&\bigsqcup_{p \in predecessor(l)} \left[\!\left[p \right]\!\right](s) \end{alignedat}$

where

$\begin{alignedat}{1} \operatorname{\mathit{swap}}(P \times N) &= N \times P \\ \operatorname{\mathit{isNullable}}(s) &= \left. \begin{cases} (\operatorname{\texttt{kotlin.Nothing?}}\times \top) & \text{if } s \sqsubseteq (\operatorname{\texttt{kotlin.Nothing?}}\times \top) \\ (\top \times \top) & \text{otherwise} \end{cases} \right. \end{alignedat}$

After the data-flow analysis is done, for a program location $l$ we have its data-flow information $\left[\!\left[l \right]\!\right]$ , which contains data-flow facts $\left[\!\left[l \right]\!\right][e] = (P \times N)$ for an expression $e$ .

The data-flow information is used to produce the smart cast type as follows.

First, smart casts may influence the compile-time type of an expression $e$ (called smart cast sink) only if the sink is stable.

Second, for a stable smart cast sink $e$ we calculate the overapproximation of its possible type.

$\left[\!\left[l \right]\!\right][e] = (P \times N) \Rightarrow \operatorname{\mathit{smartCastTypeOf}}(e) = \operatorname{\mathit{typeOf}}(e) \mathbin{\operatorname{\&}}P \mathbin{\operatorname{\&}}\operatorname{\mathit{approxNegationType}}(N)$

$\operatorname{\mathit{approxNegationType}}(N) = \left. \begin{cases} \operatorname{\texttt{kotlin.Any}}& \text{if } \operatorname{\texttt{kotlin.Nothing?}}<: N \\ \operatorname{\texttt{kotlin.Any?}}& \text{otherwise} \end{cases} \right.$

As a result, $\operatorname{\mathit{smartCastTypeOf}}(e)$ is used as a compile-time type of $e$ for most purposes (including, but not limited to, function overloading and type inference of other values).

Smart casts are introduced by the following Kotlin constructions.

A smart cast sink is stable for smart casting if its value cannot be changed via means external to the CFG; this guarantees the smart cast conditions calculated by the data-flow analysis still hold at the sink. This is one of the necessary conditions for smart cast to be applicable to an expression.

Smart cast sink stability breaks in the presence of the following aspects.

The following smart cast sinks are considered stable.

We will call redefinition of $e$ direct redefinition, if it happens in the same declaration scope as the definition of $e$ . If $e$ is redefined in a nested declaration scope (w.r.t. its definition), this is a nested redefinition.

We define direct and nested smart cast sinks in a similar way.

A mutable local property $P$ defined at $D$ is considered effectively immutable at a direct sink $S$ , if there are no nested redefinitions on any CFG path between $D$ and $S$ .

A mutable local property $P$ defined at $D$ is considered effectively immutable at a nested sink $S$ , if there are no nested redefinitions of $P$ and all direct redefinitions of $P$ precede $S$ in the CFG.

As mentioned before, a compiler may use $\operatorname{\mathit{killDataFlow}}$ instructions in loops to avoid slow data-flow analysis convergence. In the general case, a loop body may be evaluated zero or more times, which, combined with $\operatorname{\mathit{killDataFlow}}$ instructions, causes the smart cast sources from the loop body to not propagate to the containing scope. However, some loops, for which we can have static guarantees about how their body is evaluated, may be handled differently. For the following loop configurations, we consider their bodies to be definitely evaluated one or more times.

In some cases, it is possible to introduce smart casting between properties if it is known at compile-time that these properties are bound to each other. For instance, if a variable a is initialized as a copy of variable b and both are stable, they are guaranteed to reference the same runtime value and any assumption about a may be also applied to b and vice versa.

In more complex cases, however, it may not be trivial to deduce that two (or more) properties point to the same runtime object. This relation is known as must-alias relation between program references and it is implementation-defined in which cases a particular Kotlin compiler may safely assume this relation holds between two particular properties at a particular program point. However, it must guarantee that if two properties are considered bound, it is impossible for these properties to reference two different values at runtime.

One way of implementing bound smart casts would be to divide the space of stable program properties into disjoint alias sets of properties, and the analysis described above links the smart cast data flow information to sets of properties instead of single properties.

Such view could be further refined by considering special alias sets separately; e.g., an alias set of definitely non-null properties, which would allow the compiler to infer that a?.b !== null implies a !== null (for non-nullable b).

Local type inference in Kotlin is the process of deducing the compile-time types of expressions, lambda expression parameters and properties. As previously mentioned, type inference is a type constraint problem, and is usually solved by a type constraint solver.

In addition to the types of intermediate expressions, local type inference also performs deduction and substitution for generic type parameters of functions and types involved in every expression. You can use the Expressions part of this specification as a reference point on how the types for different expressions are constructed.

Type inference in Kotlin is bidirectional; meaning the types of expressions may be derived not only from their arguments, but from their usage as well. Note that, albeit bidirectional, this process is still local, meaning it processes one statement at a time, strictly in the order of their appearance in a scope; e.g., the type of property in statement $S_1$ that goes before statement $S_2$ cannot be inferred based on how $S_1$ is used in $S_2$ .

As solving a type constraint system is not a definite process (there may be more than one valid solution for a given constraint system), type inference may create several valid solutions. In particular, one may always derive a constraint $A <: T <: B$ for every free type variable $T$ , where types $A$ and $B$ are both valid solutions.

In these cases an optimal constraint system solution is picked w.r.t. local type inference.

Function signature type inference is a variant of local type inference, which is performed for function declarations, lambda literals and anonymous function declarations.

As described here, a named function declaration body may come in two forms: an expression body (a single expression) or a control structure body. For the latter case, an expected return type must be provided or is assumed to be kotlin.Unit and no special kind of type inference is needed. For the former case, an expected return type may be provided or can be inferred using local type inference from the expression body. If the expected return type is provided, it is used as an expected constraint on the result type of the expression body.

Complex statements involving one or more lambda literals introduce an additional level of complexity to type inference and overload resolution mechanisms. As mentioned in the overload resolution section, the overload resolution of callables involved in such statements is performed regardless of the contents of the lambda expressions and before any processing of their bodies is performed (including local type inference).

For a complex statement $S$ involving (potentially overloaded) callables $C_1, \ldots, C_N$ and lambda literals $L_1, \ldots, L_M$ , excluding the bodies of these literals, they are processed as follows.

The external constraints on lambda parameters, return value and body may come from the following sources:

Bare type argument inference is a special kind of type inference where, given a type $T$ and a constructor $TC$ , the type arguments $A_0, A_1 \ldots A_N$ are inferred such that $TC[A_0, A_1 \ldots A_N] <: T$ . It is used together with bare types syntax sugar that can be employed in type checking and casting operators. The process is performed as follows.

First, let’s consider the simple case of $T$ being non-nullable, non-intersection type. Then, a simple type constraint system is constructed by introducing type variables for $A_0, A_1 \ldots A_N$ and then solving the constraint $TC[A_0, A_1 \ldots A_N] <: T$ .

If $T$ is an intersection type, the same process is performed for every member of the intersection type individually and then the resulting type argument values for each parameter $A_K$ are merged using the following principle:

If $T$ is a nullable type $U?$ , the steps given above are performed for its non-nullable counterpart type $U$ .

When working with DSLs that have generic builder functions, one may want to infer the generic builder type parameters using the information from the builder’s lambda body. Kotlin supports special kind of type inference called builder-style type inference to allow this in some cases.

In order to allow builder-style inference for a generic builder function and its type parameter P, it should satisfy the following requirements:

In essence, the builder-style inference allows the type of the lambda parameter receiver to be inferred from its usage in the lambda body. This is performed only if the standard type inference cannot infer the required types, meaning one could provide additional type information to help with the inference, e.g., via explicit type arguments, and avoid the need for builder-style inference.

If the builder-style inference is needed, for a call to an eligible function with a lambda parameter, the inference is performed as described above, but the type arguments of the lambda parameter receiver are viewed as postponed type variables till the body of the lambda expression is proceeded.

After the inference of statements inside the lambda is complete, these postponed type variables are inferred using an additional type inference step, which takes the resulting type constraint system and tries to find the instantiation of the postponed type variables to concrete types.

If the system cannot be solved, it is a compile-time error.

Builder-style inference has the following important restrictions.

Kotlin language specification

Smart casts

Data-flow framework

Smart cast lattices

Smart cast transfer functions

Smart cast types

Smart cast sink stability

Effectively immutable smart cast sinks

Loop handling

Bound smart casts

Local type inference

Function signature type inference

Named and anonymous function declarations

Statements with lambda literals

Bare type argument inference

Builder-style type inference

Type inference Load tests

Smart casts Load tests

Data-flow framework Load tests

Smart cast lattices Load tests

Smart cast transfer functions Load tests

Smart cast types Load tests

Smart cast sink stability Load tests

Effectively immutable smart cast sinks Load tests

Loop handling Load tests

Bound smart casts Load tests

Local type inference Load tests

Function signature type inference Load tests

Named and anonymous function declarations Load tests

Statements with lambda literals Load tests

Bare type argument inference Load tests

Builder-style type inference Load tests