Kotlin Help

Retrieve data from files

Kotlin Notebook, coupled with the Kotlin DataFrame library, enables you to work with both non-structured and structured data. This combination offers the flexibility to transform non-structured data, such as data found in TXT files, into structured datasets.

For data transformations, you can use such methods as .add(), .split(), .convert(), and .parse(). Additionally, this toolset enables the retrieval and manipulation of data from various structured file formats, including CSV, JSON, XLS, Parquet, and Apache Arrow. See all supported formats in the DataFrame documentation.

In this guide, you can learn how to retrieve, refine, and handle data through multiple examples.

Before you start

Kotlin Notebook relies on the Kotlin Notebook plugin, which is bundled and enabled in IntelliJ IDEA by default.

If the Kotlin Notebook features are not available, ensure the plugin is enabled. For more information, see Set up an environment.

To follow this tutorial:

  1. Create a new Kotlin Notebook.

  2. Import Kotlin DataFrame:

    %use dataframe

Retrieve data

To retrieve data from a file into your Kotlin Notebook, use the DataFrame.read() function:

val movies = DataFrame.read("movies.csv")

The DataFrame.read() function detects the input format based on the file extension and content.

You can also pass additional arguments to control how the DataFrame library reads the input data. For example, the following code specifies a custom delimiter (;) for a CSV file:

val movies = DataFrame.read("movies.csv", delimiter = ';')

Display data

Once you have the data in your notebook, you can display it. The easiest way is to store your data in a variable and then return it:

val jsonDf = DataFrame.read("jsonFile.json") jsonDf

This code displays the data from your file as an interactive table:

Display data

You can use this view to inspect values, check column names, and easily understand the state of your dataset.

Inspect data structure

To gain insights into the structure or schema of your data, use the .schema() function on your DataFrame variable.

For example, run jsonDf.schema() to list the type of each column in your JSON dataset:

Schema example

With Kotlin Notebook, you can also use the autocompletion feature. It allows you to quickly access and manipulate the properties of your DataFrame. After loading your data, simply type the DataFrame variable followed by a period (.) to see a list of available columns and their types.

Available properties

Refine data

Kotlin DataFrame provides various operations for refining your dataset. For example, grouping, filtering, updating, or adding new columns. These functions are essential for data analysis, allowing you to organize, clean, and transform your data effectively.

For example, let's look at the movies.csv dataset. It stores movie titles and release years in the same cell. The goal is to refine this dataset for easier analysis:

  1. Load the data

    Load the file into a DataFrame using the .read() function:

    val movies = DataFrame.read("movies.csv")
  2. Add a column

    To extract the release year from the title column, add a new year column:

    val moviesWithYear = movies .add("year") { "\\d{4}".toRegex() .findAll(title) .lastOrNull() ?.value ?.toInt() ?: -1 } moviesWithYear
  3. Update values

    To remove the release year from the movie title, update the title column:

    val moviesTitle = moviesWithYear .update("title") { "\\s*\\(\\d{4}\\)\\s*$".toRegex().replace(title, "") } moviesTitle

    The code keeps the movie titles in one column and moves the release years into another column.

  4. Filter rows

    To focus on specific data, use the .filter() function. For example, to keep only the movies released after 1986, run:

    val newMovies = moviesTitle.filter { year >= 1996 } newMovies
  5. Remove column

    To remove a column that you do not need, use the .remove() function:

    val refinedMovies = newMovies.remove { movieID } refinedMovies

For comparison, here is the dataset before refinement:

Original dataset

The dataset after refinement:

Data refinement result

Export data

After refining data in Kotlin Notebook, you can easily export your processed data.

You can utilize a variety of .write() functions for this purpose. It supports saving in multiple formats, including CSV, JSON, XLS, XLSX, Apache Arrow, and even HTML tables. See all supported formats in the DataFrame documentation. This can be particularly useful for sharing your findings, creating reports, or making your data available for further analysis.

For example, let's save the result as:

  • JSON file using the .writeJson() function:

    refinedMovies.writeJson("movies.json")
  • CSV file using the .writeCsv() function:

    refinedMovies.writeCsv("movies.csv")
  • Apache Arrow files using the .writeArrorIPC() and .writeArrorFeather() functions:

    refinedMovies.writeArrowIPC("movies.arrow") refinedMovies.writeArrowFeather("movies.feather")

You can also open a standalone HTML table in your browser with the .toStandaloneHTML() function:

refinedMoviesDf .toStandaloneHTML(DisplayConfiguration(rowsLimit = null)) .openInBrowser()

What's next

13 May 2026