Skip to content

Commit

Permalink
Deployed 2614c9f with MkDocs version: 1.6.1
Browse files Browse the repository at this point in the history
  • Loading branch information
github-actions[bot] committed Oct 18, 2024
0 parents commit 7a1e967
Show file tree
Hide file tree
Showing 63 changed files with 13,368 additions and 0 deletions.
Empty file added .nojekyll
Empty file.
543 changes: 543 additions & 0 deletions 404.html

Large diffs are not rendered by default.

114 changes: 114 additions & 0 deletions articles/get-started.qmd
Original file line number Diff line number Diff line change
@@ -0,0 +1,114 @@
---
title: "Get started"
format: gfm
eval: false
---

!!! tip

To run the code from this article as a Python script:

```bash
python3 examples/get-started.py
```

## Import stuff

```{python}
import torch
from tinytopics.fit import fit_model
from tinytopics.plot import plot_loss, plot_structure, plot_top_terms
from tinytopics.utils import (
set_random_seed,
generate_synthetic_data,
align_topics,
sort_documents,
)
```

Set seed for reproducibility

```{python}
set_random_seed(42)
```

Generate synthetic data

```{python}
n, m, k = 5000, 1000, 10
X, true_L, true_F = generate_synthetic_data(n, m, k, avg_doc_length=256 * 256)
```

## Training

Train the model

```{python}
model, losses = fit_model(X, k)
```

Plot loss curve

```{python}
plot_loss(losses, output_file="loss.png")
```

## Post-process results

Derive matrices

```{python}
with torch.no_grad():
learned_L = torch.softmax(model.L.weight, dim=1).cpu().numpy()
learned_F = torch.softmax(model.F, dim=1).cpu().numpy()
```

Align topics

```{python}
aligned_indices = align_topics(true_F, learned_F)
learned_F_aligned = learned_F[aligned_indices]
learned_L_aligned = learned_L[:, aligned_indices]
```

Sort documents

```{python}
sorted_indices = sort_documents(true_L)
true_L_sorted = true_L[sorted_indices]
learned_L_sorted = learned_L_aligned[sorted_indices]
```

## Visualize results

STRUCTURE plot

```{python}
plot_structure(
true_L_sorted,
title="True Document-Topic Distributions (Sorted)",
output_file="L_true.png",
)
plot_structure(
learned_L_sorted,
title="Learned Document-Topic Distributions (Sorted and Aligned)",
output_file="L_learned_aligned.png",
)
```

Top terms plot

```{python}
plot_top_terms(
true_F,
n_top_terms=15,
title="Top Terms per Topic - True F Matrix",
output_file="F_top_terms_true.png",
)
plot_top_terms(
learned_F_aligned,
n_top_terms=15,
title="Top Terms per Topic - Learned F Matrix (Aligned)",
output_file="F_top_terms_learned_aligned.png",
)
```
Loading

0 comments on commit 7a1e967

Please sign in to comment.