Retrieve data from web sources and APIs

Before you start

Download and install the latest version of IntelliJ IDEA Ultimate.
Install the Kotlin Notebook plugin in IntelliJ IDEA.
tip
Alternatively, access the Kotlin Notebook plugin from Settings | Plugins | Marketplace within IntelliJ IDEA.
Create a new Kotlin Notebook by selecting File | New | Kotlin Notebook.
In the Kotlin Notebook, import the Kotlin DataFrame library by running the following command:
```
%use dataframe
```

Fetch data from an API

Open your Kotlin Notebook file (.ipynb).
Import the Kotlin DataFrame library, which is essential for data manipulation tasks. This is done by running the following command in a code cell:
```
%use dataframe
```
Securely add your API key in a new code cell, which is necessary for authenticating requests to the YouTube Data API. You can obtain your API key from the credentials tab:
```
val apiKey = "YOUR-API_KEY"
```
Create a load function that takes a path as a string and uses the DataFrame's .read() function to fetch data from the YouTube Data API:
```
fun load(path: String): AnyRow = DataRow.read("https://www.googleapis.com/youtube/v3/$path&key=$apiKey")
```

Organize the fetched data into rows and handle the YouTube API's pagination through the nextPageToken. This ensures you gather data across multiple pages:

fun load(path: String, maxPages: Int): AnyFrame {

    // Initializes a mutable list to store rows of data.
    val rows = mutableListOf<AnyRow>()

    // Sets the initial page path for data loading.
    var pagePath = path
    do {

        // Loads data from the current page path.
        val row = load(pagePath)
        // Adds the loaded data as a row to the list.
        rows.add(row)

        // Retrieves the token for the next page, if available.
        val next = row.getValueOrNull<String>("nextPageToken")
        // Updates the page path for the next iteration, including the new token.
        pagePath = path + "&pageToken=" + next

        // Continues loading pages until there's no next page.
    } while (next != null && rows.size < maxPages)

    // Concatenates and returns all loaded rows as a DataFrame.
    return rows.concat()
}

Use the previously defined load() function to fetch data and create a DataFrame in a new code cell. This example fetches data, or in this case, videos related to Kotlin, with a maximum of 50 results per page, up to a maximum of 5 pages. The result is stored in the df variable:
```
val df = load("search?q=kotlin&maxResults=50&part=snippet", 5)
df
```
Finally, extract and concatenate items from the DataFrame:
```
val items = df.items.concat()
items
```

Clean and refine data

You can start by reorganizing and cleaning your data. This involves moving certain columns under new headers and removing unnecessary ones for clarity:
```
val videos = items.dropNulls { id.videoId }
    .select { id.videoId named "id" and snippet }
    .distinct()
videos
```
Chunk IDs from the cleaned data and load corresponding video statistics. This involves breaking the data into smaller batches and fetching additional details:
```
val statPages = clean.id.chunked(50).map {
    val ids = it.joinToString("%2C")
    load("videos?part=statistics&id=$ids")
}
statPages
```

Concatenate the fetched statistics and select relevant columns:

val stats = statPages.items.concat().select { id and statistics.all() }.parse()
stats

Join the existing cleaned data with the newly fetched statistics. This merges two sets of data into a comprehensive DataFrame:
```
val joined = clean.join(stats)
joined
```

Analyze data in Kotlin Notebook

Let's look at an example, using groupBy to categorize videos by channel, sum to calculate total views per category, and maxBy to find the latest or most viewed video in each group:

Simplify the access to specific columns by setting up references:
```
val view by column<Int>()
```
Use the groupBy method to group the data by the channel column and sort it.
```
val channels = joined.groupBy { channel }.sortByCount()
```

In the resulting table, you can interactively explore the data. Clicking on the group field of a row corresponding to a channel expands that row to reveal more details about that channel's videos.

Use aggregate, sum, maxBy, and flatten to create a DataFrame summarizing each channel's total views and details of its latest or most viewed video:

val aggregated = channels.aggregate {
    viewCount.sum() into view

    val last = maxBy { publishedAt }
    last.title into "last title"
    last.publishedAt into "time"
    last.viewCount into "viewCount"
    // Sorts the DataFrame in descending order by view count and transform it into a flat structure.
}.sortByDesc(view).flatten()
aggregated

Retrieve data from web sources and APIs﻿

tip

Before you start﻿

tip

Fetch data from an API﻿

Clean and refine data﻿

Analyze data in Kotlin Notebook﻿

What's next﻿

Retrieve data from web sources and APIs

Before you start

Fetch data from an API

Clean and refine data

Analyze data in Kotlin Notebook

What's next