Retrieve data from files
Kotlin Notebook, coupled with the Kotlin DataFrame library, enables you to work with both non-structured and structured data. This combination offers the flexibility to transform non-structured data, such as data found in TXT files, into structured datasets.
For data transformations, you can use such methods as .add(), .split(), .convert(), and .parse(). Additionally, this toolset enables the retrieval and manipulation of data from various structured file formats, including CSV, JSON, XLS, Parquet, and Apache Arrow. See all supported formats in the DataFrame documentation.
In this guide, you can learn how to retrieve, refine, and handle data through multiple examples.
Before you start
Kotlin Notebook relies on the Kotlin Notebook plugin, which is bundled and enabled in IntelliJ IDEA by default.
If the Kotlin Notebook features are not available, ensure the plugin is enabled. For more information, see Set up an environment.
To follow this tutorial:
Create a new Kotlin Notebook.
Import Kotlin DataFrame:
%use dataframe
Retrieve data
To retrieve data from a file into your Kotlin Notebook, use the DataFrame.read() function:
The DataFrame.read() function detects the input format based on the file extension and content.
You can also pass additional arguments to control how the DataFrame library reads the input data. For example, the following code specifies a custom delimiter (;) for a CSV file:
Display data
Once you have the data in your notebook, you can display it. The easiest way is to store your data in a variable and then return it:
This code displays the data from your file as an interactive table:

You can use this view to inspect values, check column names, and easily understand the state of your dataset.
Inspect data structure
To gain insights into the structure or schema of your data, use the .schema() function on your DataFrame variable.
For example, run jsonDf.schema() to list the type of each column in your JSON dataset:

With Kotlin Notebook, you can also use the autocompletion feature. It allows you to quickly access and manipulate the properties of your DataFrame. After loading your data, simply type the DataFrame variable followed by a period (.) to see a list of available columns and their types.

Refine data
Kotlin DataFrame provides various operations for refining your dataset. For example, grouping, filtering, updating, or adding new columns. These functions are essential for data analysis, allowing you to organize, clean, and transform your data effectively.
For example, let's look at the movies.csv dataset. It stores movie titles and release years in the same cell. The goal is to refine this dataset for easier analysis:
Load the data
Load the file into a
DataFrameusing the.read()function:val movies = DataFrame.read("movies.csv")Add a column
To extract the release year from the
titlecolumn, add a newyearcolumn:val moviesWithYear = movies .add("year") { "\\d{4}".toRegex() .findAll(title) .lastOrNull() ?.value ?.toInt() ?: -1 } moviesWithYearUpdate values
To remove the release year from the movie title, update the
titlecolumn:val moviesTitle = moviesWithYear .update("title") { "\\s*\\(\\d{4}\\)\\s*$".toRegex().replace(title, "") } moviesTitleThe code keeps the movie titles in one column and moves the release years into another column.
Filter rows
To focus on specific data, use the
.filter()function. For example, to keep only the movies released after 1986, run:val newMovies = moviesTitle.filter { year >= 1996 } newMoviesRemove column
To remove a column that you do not need, use the
.remove()function:val refinedMovies = newMovies.remove { movieID } refinedMovies
For comparison, here is the dataset before refinement:

The dataset after refinement:

Export data
After refining data in Kotlin Notebook, you can easily export your processed data.
You can utilize a variety of .write() functions for this purpose. It supports saving in multiple formats, including CSV, JSON, XLS, XLSX, Apache Arrow, and even HTML tables. See all supported formats in the DataFrame documentation. This can be particularly useful for sharing your findings, creating reports, or making your data available for further analysis.
For example, let's save the result as:
JSON file using the
.writeJson()function:refinedMovies.writeJson("movies.json")CSV file using the
.writeCsv()function:refinedMovies.writeCsv("movies.csv")Apache Arrow files using the
.writeArrorIPC()and.writeArrorFeather()functions:refinedMovies.writeArrowIPC("movies.arrow") refinedMovies.writeArrowFeather("movies.feather")
You can also open a standalone HTML table in your browser with the .toStandaloneHTML() function:
What's next
Explore data visualization using the Kandy library
Find additional information about data visualization in Data visualization in Kotlin Notebook with Kandy
For an extensive overview of tools and resources available for data science and analysis in Kotlin, see Kotlin and Java libraries for data analysis