---
title: "An Introduction to polarisR"
output: rmarkdown::html_vignette
vignette: >
  %\VignetteIndexEntry{An Introduction to polarisR}
  %\VignetteEngine{knitr::rmarkdown}
  %\VignetteEncoding{UTF-8}
---

## Introduction

Welcome to the `polarisR`.....

This document provides a comprehensive guide to use the `polarisR`. We will walk through each tab of the application, explaining the features and functionalities available to help you explore and understand your high-dimensional data.

<div style="background-color: #f0f8ff; padding: 1.5em; border-radius: 8px; text-align: center; margin: 1em 0;">
<h2 style="margin-top: 0;">What does polarisR stand for?</h2>
<p style="font-size: 1.1em; font-family: monospace; margin-bottom: 0;">
<b>p</b>rojective <b>o</b>utput <b>l</b>ayouts <b>a</b>nd <b>r</b>educed <b>i</b>nteractive <b>s</b>urfaces in <b>R</b>
</p>
</div>


## Application Overview

The polarisR interface is organized into **five main tabs**, each designed for specific aspects of your data analysis workflow:

1. **Dataset Preview** - Load and explore your data, select columns, and manage datasets
2. **Non-linear dimension reduction (NLDR)** - Apply NLDR methods (t-SNE/UMAP) with parameter configuration and visualization
3. **Dynamic Tour** - Explore high-dimensional structure through animated projections  
4. **Diagnosing** - Assess embedding quality using quantitative methods
5. **2-D Layout Comparison** - Compare different NLDR configurations and results

Each tab builds upon the previous ones, creating a comprehensive workflow from data loading to advanced comparative analysis. Let's explore each tab in detail.

## Dataset Preview Tab

The **Dataset Preview** tab is the starting point of your analysis. Here, you can load your data, select relevant columns, and get a quick overview of your dataset.

<img src="../man/figures/dataset-preview.png" alt="Dataset Preview Tab" style="width:700px; margin: 20px auto; display: block; border: 1px solid #ddd; border-radius: 5px; padding: 5px;">

**Features**:

*   **Upload Dataset:** You can upload your own dataset in CSV format using the "Upload Dataset" button. The application will automatically validate the file and handle potential errors.
*   **Example Datasets:** `polarisR` comes with two pre-loaded datasets: `four_clusters` and `pdfsense`. You can select any of these to explore the application's features without needing your own data.

```r
# Access the datasets directly
data(four_clusters, package = "polarisR")
data(pdfsense, package = "polarisR") 

# View dataset information
?four_clusters
?pdfsense
```

**Dataset Descriptions:**

*   **four_clusters**: A synthetic dataset with four distinct clusters, perfect for testing clustering visualization
*   **pdfsense**: A high-energy physics dataset representing parton distribution function fits

**Additional Features:**

*   **Column Selection:** After loading a dataset, you can choose which columns to include in the NLDR analysis. By default, all columns are selected. You can manually select or deselect columns and apply the changes.
*   **Data Preview:** A table displays the first few rows of your dataset, allowing you to inspect the data and ensure it has been loaded correctly.
*   **Dataset Information:** This section provides a summary of your dataset, including the number of rows, columns, and the types of columns (numeric or categorical).
*   **NLDR Datasets:** As you run different NLDR analyses, the results will be stored and listed here. You can easily switch between different results to compare them.

## Non-linear dimension reduction (NLDR) Tab

The **Non-linear dimension reduction (NLDR)** tab is where the main NLDR analysis happens. You can choose between t-SNE and UMAP, configure their parameters, and visualize the results.

<img src="../man/figures/dataset-visualization.png" alt="Dataset Visualization Tab" style="width:700px; margin: 20px auto; display: block; border: 1px solid #ddd; border-radius: 5px; padding: 5px;">

**Features**:

*   **Choose Method:** Select either `t-SNE` or `UMAP` as your NLDR method.
*   **t-SNE Parameters:**
    *   **Perplexity:** Adjust the perplexity value, which influences the number of nearest neighbors for each point.
    *   **Max Iterations:** Set the maximum number of iterations for the t-SNE algorithm.
    *   **Auto-adjust perplexity:** Let the application automatically choose a suitable perplexity value based on your data. The formula used is: `perplexity = max(5, min(30, floor(n_samples / 3) - 1))`, which ensures perplexity is between 5 and 30, and scales with dataset size to maintain effective neighborhood structure.
*   **UMAP Parameters:**
    *   **Number of Neighbors:** Control the size of the local neighborhood UMAP will use.
    *   **Min. Distance:** Set the minimum distance between embedded points.
*   **Color Options:** Choose a column from your dataset to color the points in the visualization. This is useful for identifying clusters or patterns.
*   **Reproducibility Options:** Set a random seed to ensure that your NLDR results are reproducible.
*   **Run Visualization:** Click this button to start the NLDR computation. The progress will be displayed, and the resulting visualization will be shown on the right.
*   **Visualization Information:** This panel displays the parameters used for the current visualization, making it easy to track your experiments.

## Dynamic Tour Tab

The **Dynamic Tour** tab offers an interactive way to explore the high-dimensional space of your data. It provides a dynamic projection of the data, which can be viewed as a scatter plot, sage plot, or slice plot.

<img src="../man/figures/dynamic-tour.png" alt="Dynamic Tour Tab" style="width:700px; margin: 20px auto; display: block; border: 1px solid #ddd; border-radius: 5px; padding: 5px;">

**Features**:

*   **Select Tour Display:** Choose from three types of dynamic tours:
    *   **Scatter:** A standard scatter plot of the projected data.
    *   **Sage:** A scatter plot display that adjusts for the projected volume, defined in [Laa et al. (2021)](https://doi.org/10.1080/10618600.2021.1963264).
    *   **Slice:** A scatter plot display that highlights points close to the projection plane, defined in [Laa et al. (2020)](https://doi.org/10.1080/10618600.2020.1777140).
*   **Tour Options:**
    *   **Show Axes:** Toggle the visibility of the axes in the tour plot.
    *   **Show Wireframe:** Toggle the visibility of wireframe edges in the tour plot for enhanced structural visualization.
    *   **Point Opacity (Alpha):** Adjust the transparency of the points.
    *   **Gamma (for Sage):** Control the effective dimensionality parameter for the sage plot.
    *   **Slice Relative Volume (for Slice):** Adjust the thickness of the slice.
*   **Enable Linked Brushing:** When enabled, you can select points in the NLDR plot, and the corresponding points will be highlighted in the dynamic tour plot, and vice-versa. This is a powerful feature for exploring the relationship between the low-dimensional embedding and the original high-dimensional data.

## Diagnosing Tab

The **Diagnosing** tab provides tools to assess the selected NLDR layout. It uses the [`quollr` package] (https://github.com/JayaniLakshika/quollr) to perform a quantitative analysis of the NLDR layout and helps you to find the optimal binwidth for the model fitting.

**Features**:

*   **Binwidth Optimization:** This feature automatically tests a range of bin widths for the `quollr` analysis and finds the optimal configuration based on the Root Mean Square Error (RMSE).
*   **Run Quollr Analysis:** After optimizing the binwidth, you can run the full `quollr` analysis to get a detailed assessment of your embedding.
*   **Analysis Results:** The results of the analysis are presented in several tabs:
    *   **RMSE vs Binwidth:** An interactive plot showing the RMSE for different binwidth values.
    <img src="../man/figures/rmse-vs-binwidth.png" alt="RMSE vs Binwidth" style="width:700px; margin: 20px auto; display: block; border: 1px solid #ddd; border-radius: 5px; padding: 5px;">
    *   **Model Summary:** A table with the detailed results of the binwidth optimization.
    <img src="../man/figures/optimization-table.png" alt="Optimization Table" style="width:700px; margin: 20px auto; display: block; border: 1px solid #ddd; border-radius: 5px; padding: 5px;">
    *   **Model Error:** A visualization shows the NLDR layout, with points colored according to the high-dimensional model error.
    <img src="../man/figures/model-fit.png" alt="Model Fit" style="width:700px; margin: 20px auto; display: block; border: 1px solid #ddd; border-radius: 5px; padding: 5px;">
*   **Configuration & Summary:** This section displays the optimal binwidth configuration and a summary of the `quollr` model fit.

## 2-D Layout Comparison Tab

The **2-D Layout Comparison** tab allows you to compare the results of different NLDR layouts. You can compare different methods (t-SNE vs. UMAP) or the same method with different hyper-parameters.


**Features**:

*   **Choose Comparison Type:**
    *   **NLDR Settings Comparison:** Compare the RMSE of different NLDR configurations. This is useful for finding the best method and hyper-parameters for your data.
    <img src="../man/figures/method-comparison1.png" alt="NLDR Settigns Comparison" style="width:700px; margin: 20px auto; display: block; border: 1px solid #ddd; border-radius: 5px; padding: 5px;">
    *   **Side-by-Side Visualization:** Display two NLDR plots next to each other for a direct visual comparison.
    <img src="../man/figures/method-comparison2.png" alt="Side bySide Visualization" style="width:700px; margin: 20px auto; display: block; border: 1px solid #ddd; border-radius: 5px; padding: 5px;">
    *   **Enable Linked Brushing:** In the side-by-side visualization mode, you can enable linked brushing to select points in one plot and see the corresponding points highlighted in the other.
    <img src="../man/figures/method-comparison3.png" alt="Linking Brushing" style="width:700px; margin: 20px auto; display: block; border: 1px solid #ddd; border-radius: 5px; padding: 5px;">
*   **Dataset Selection:** Select the NLDR results you want to compare from the list of stored results.
*   **Run Comparison Plot:** Generate the comparison plot based on your selection.
*   **Best Configuration Summary:** When comparing NLDR settings, this section will show you the best configuration found based on the RMSE.

