Connect and retrieve data from databases

Before you start

Connect to database

You can connect to and interact with an SQL database using specific functions from the Kotlin DataFrame library. You can use DatabaseConfiguration to establish a connection to your database and getSchemaForAllSqlTables() to retrieve the schema of all tables within it.

Open your Kotlin Notebook file (.ipynb).
Add a dependency for a JDBC (Java Database Connectivity) driver, and specify the JDBC driver version. This example uses MariaDB:
```
USE {
   dependencies("org.mariadb.jdbc:mariadb-java-client:$version")
}
```
Import the Kotlin DataFrame library, which is essential for data manipulation tasks, along with the necessary Java libraries for SQL connectivity and utility functions:
```
%use dataframe
import java.sql.DriverManager
import java.util.*
```

Use the DatabaseConfiguration class to define your database's connection parameters, including the URL, username, and password:

val URL = "YOUR_URL"
val USER_NAME = "YOUR_USERNAME"
val PASSWORD = "YOUR_PASSWORD"

val dbConfig = DatabaseConfiguration(URL, USER_NAME, PASSWORD)

Once connected, use the getSchemaForAllSqlTables() function to fetch and display the schema information for each table in the database:
```
val dataschemas = DataFrame.getSchemaForAllSqlTables(dbConfig)

dataschemas.forEach {
    println("---Yet another table schema---")
    println(it)
    println()
}
```
tip
For more information on connecting to SQL databases, see Read from SQL databases in the Kotlin DataFrame documentation.

Retrieve and manipulate data

After establishing a connection to an SQL database, you can retrieve and manipulate data in Kotlin Notebook, utilizing the Kotlin DataFrame library. You can use the readSqlTable() function to retrieve data. To manipulate data, you can use methods, such as filter, groupBy, and convert.

Use the readSqlTable() function to retrieve data from the "movies" table, setting limit to restrict the query to the first 100 records for efficiency:
```
val dfs = DataFrame.readSqlTable(dbConfig, tableName = "movies", limit = 100)
```

Use an SQL query to retrieve a specific dataset related to movies directed by Quentin Tarantino. This query selects movie details and combines genres for each movie:

val props = Properties()
props.setProperty("user", USER_NAME)
props.setProperty("password", PASSWORD)

val TARANTINO_FILMS_SQL_QUERY = """
    SELECT name, year, rank, GROUP_CONCAT(genre) as "genres"
    FROM movies JOIN movies_directors ON movie_id = movies.id
    JOIN directors ON directors.id=director_id LEFT JOIN movies_genres ON movies.id = movies_genres.movie_id
    WHERE directors.first_name = "Quentin" AND directors.last_name = "Tarantino"
    GROUP BY name, year, rank
    ORDER BY year
    """

// Retrieves a list of Quentin Tarantino's movies, including their name, year, rank, and a concatenated string of all genres.
// The results are grouped by name, year, rank, and sorted by year.

var dfTarantinoMovies: DataFrame<*>

DriverManager.getConnection(URL, props).use { connection ->
   connection.createStatement().use { st ->
      st.executeQuery(TARANTINO_FILMS_SQL_QUERY).use { rs ->
         val dfTarantinoFilmsSchema = DataFrame.getSchemaForResultSet(rs, connection)
         dfTarantinoFilmsSchema.print()

         dfTarantinoMovies = DataFrame.readResultSet(rs, connection)
         dfTarantinoMovies
      }
   }
}

After fetching the Tarantino movies dataset, you can further manipulate and filter the data.

val df = dfTarantinoMovies
    // Replaces any missing values in the 'year' column with 0.
    .fillNA { year }.with { 0 }

    // Converts the 'year' column to integers.
    .convert { year }.toInt()

    // Filters the data to include only movies released after the year 2000.
    .filter { year > 2000 }
df

Analyze data in Kotlin Notebook

Extract data from the "actors" table using the readSqlTable() function:
```
val actorDf = DataFrame.readSqlTable(dbConfig, "actors", 10000)
```

Process the retrieved data to identify the top 20 most common actor first names. This analysis involves several DataFrame methods:

val top20ActorNames = actorDf
    // Groups the data by the first_name column to organize it based on actor first names.
   .groupBy { first_name }

    // Counts the occurrences of each unique first name, providing a frequency distribution.
   .count()

    // Sorts the results in descending order of count to identify the most common names.
   .sortByDesc("count")

    // Selects the top 20 most frequent names for analysis.
   .take(20)
top20ActorNames

Connect and retrieve data from databases﻿

Before you start﻿

Connect to database﻿

tip

Retrieve and manipulate data﻿

Analyze data in Kotlin Notebook﻿

What's next﻿

Connect and retrieve data from databases

Before you start

Connect to database

Retrieve and manipulate data

Analyze data in Kotlin Notebook

What's next