Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

refactor: move more queries from tb to ch #2655

Merged
merged 1 commit into from
Nov 12, 2024
Merged

Conversation

chronark
Copy link
Collaborator

@chronark chronark commented Nov 12, 2024

Summary by CodeRabbit

Release Notes

  • New Features

    • Introduced new fields in the evals table for enhanced data structure.
    • Added a new schema for managing billing rate limits and verifications in the ClickHouse database.
    • New function to retrieve billable rate limits for specified workspaces.
  • Improvements

    • Enhanced error handling and logging mechanisms in various tasks and functions.
    • Improved clarity and robustness in code structure and organization across multiple files.
  • Refactor

    • Reorganized import statements for better clarity and maintainability.
    • Streamlined existing functions and their logic without altering core functionalities.
  • Bug Fixes

    • Resolved issues related to data retrieval and logging in several components.

Copy link

changeset-bot bot commented Nov 12, 2024

⚠️ No Changeset found

Latest commit: db1519c

Merging this PR will not cause a version bump for any packages. If these changes should not result in a new version, you're good to go. If these changes should result in a version bump, you need to add a changeset.

This PR includes no changesets

When changesets are added to this PR, you'll see the packages that this PR includes changesets for and the associated semver types

Click here to learn what changesets are, and how to add one.

Click here if you're a maintainer who wants to add a changeset to this PR

Copy link

vercel bot commented Nov 12, 2024

The latest updates on your projects. Learn more about Vercel for Git ↗︎

Name Status Preview Comments Updated (UTC)
dashboard ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 12, 2024 6:15pm
engineering ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 12, 2024 6:15pm
play ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 12, 2024 6:15pm
workflows ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 12, 2024 6:15pm
www ✅ Ready (Inspect) Visit Preview 💬 Add feedback Nov 12, 2024 6:15pm

Copy link
Contributor

github-actions bot commented Nov 12, 2024

Thank you for following the naming conventions for pull request titles! 🙏

Copy link
Contributor

coderabbitai bot commented Nov 12, 2024

📝 Walkthrough

Walkthrough

This pull request encompasses a series of changes across multiple files within the billing application. The modifications primarily involve reordering import statements for clarity and organization, with several files seeing structural updates to database schemas, particularly in the evals, sections, and billing components. New fields have been added to the evals table, and a new schema for managing billable verifications and rate limits has been introduced in the ClickHouse database. Additionally, some files have seen the removal of unused components, reflecting a cleanup of the codebase.

Changes

File Path Change Summary
apps/billing/src/lib/db-marketing/schemas/entries.ts Reordered import statements; no functional changes.
apps/billing/src/lib/db-marketing/schemas/evals.ts Reordered imports; added outline field and updated recommendations field in the evals table schema.
apps/billing/src/lib/db-marketing/schemas/firecrawl.ts Reordered imports; no changes to the table structure or functionality.
apps/billing/src/lib/db-marketing/schemas/keywords.ts Reordered imports; no changes to the table structure or functionality.
apps/billing/src/lib/db-marketing/schemas/searchQuery.ts Reordered import statements; no functional changes.
apps/billing/src/lib/db-marketing/schemas/sections.ts Added imports for int, mysqlEnum, and primaryKey; defined new sectionsToKeywords and sectionContentTypes tables.
apps/billing/src/lib/db-marketing/schemas/serper.ts Reordered import statements; no changes to the table structure or functionality.
apps/billing/src/lib/firecrawl.ts Streamlined imports; enhanced error handling in getOrCreateFirecrawlResponse and getOrCreateSummary functions.
apps/billing/src/lib/keywords.ts Integrated zod for schema validation; updated functions to use validation schema.
apps/billing/src/lib/search-query.ts Reordered import statements; no functional changes.
apps/billing/src/trigger/glossary/_generate-glossary-entry.ts Reordered import statements; no functional changes.
apps/billing/src/trigger/glossary/create-pr.ts Reordered import statements; no functional changes.
apps/billing/src/trigger/glossary/draft-sections.ts Updated type imports and parameter types for clarity.
apps/billing/src/trigger/glossary/evals.ts Reordered imports; no changes to core functionality.
apps/billing/src/trigger/glossary/generate-outline.ts Reordered imports; enhanced error handling in generateOutlineTask.
apps/billing/src/trigger/glossary/keyword-research.ts Enhanced keyword research capabilities by integrating additional data sources.
apps/billing/src/trigger/glossary/seo-meta-tags.ts Reordered import statements; no functional changes.
apps/dashboard/app/(app)/apis/[apiId]/page.tsx Renamed variable emptyOverTime to _emptyOverTime; removed console logs.
apps/dashboard/app/(app)/semantic-cache/[gatewayId]/analytics/page.tsx Removed interval parameter; simplified component functionality.
apps/dashboard/app/(app)/semantic-cache/[gatewayId]/layout.tsx Removed "Logs" navigation item; no other changes.
apps/dashboard/app/(app)/semantic-cache/[gatewayId]/logs/page.tsx Deleted component for displaying semantic cache logs.
apps/dashboard/app/(app)/semantic-cache/[gatewayId]/logs/table.tsx Deleted LogsTable component; removed associated logic.
apps/dashboard/app/(app)/semantic-cache/[gatewayId]/logs/util.ts Deleted utility function for calculating log intervals.
apps/dashboard/app/(app)/settings/billing/page.tsx Updated method for fetching rate limits; no changes to overall logic.
apps/dashboard/lib/audit.ts Introduced new schema for validating audit log data; updated insertAuditLogs function.
apps/dashboard/lib/tinybird.ts Deleted file containing Tinybird API interactions.
apps/dashboard/lib/trpc/routers/key/createRootKey.ts Updated import for UnkeyAuditLog type; enhanced error handling in mutation logic.
apps/dashboard/lib/trpc/routers/rbac.ts Restructured imports; added upsertPermissions function with enhanced error handling.
internal/clickhouse/schema/025_create_billable_verifications_v2.sql Introduced new table and materialized view for billable verifications.
internal/clickhouse/schema/026_create_billable_ratelimits_v1.sql Created table and materialized view for billing rate limits.
internal/clickhouse/src/billing.ts Added getBillableRatelimits function; updated getBillableVerifications to point to a new source table.
internal/clickhouse/src/index.ts Added getBillableRatelimits to the billing property of the ClickHouse class.

Possibly related PRs

Suggested reviewers

  • mcstepp
  • perkinsjr

Thank you for using CodeRabbit. We offer it for free to the OSS community and would appreciate your support in helping us grow. If you find it useful, would you consider giving us a shout-out on your favorite social media?

❤️ Share
🪧 Tips

Chat

There are 3 ways to chat with CodeRabbit:

  • Review comments: Directly reply to a review comment made by CodeRabbit. Example:
    • I pushed a fix in commit <commit_id>, please review it.
    • Generate unit testing code for this file.
    • Open a follow-up GitHub issue for this discussion.
  • Files and specific lines of code (under the "Files changed" tab): Tag @coderabbitai in a new review comment at the desired location with your query. Examples:
    • @coderabbitai generate unit testing code for this file.
    • @coderabbitai modularize this function.
  • PR comments: Tag @coderabbitai in a new PR comment to ask questions about the PR branch. For the best results, please provide a very specific query, as very limited context is provided in this mode. Examples:
    • @coderabbitai gather interesting stats about this repository and render them as a table. Additionally, render a pie chart showing the language distribution in the codebase.
    • @coderabbitai read src/utils.ts and generate unit testing code.
    • @coderabbitai read the files in the src/scheduler package and generate a class diagram using mermaid and a README in the markdown format.
    • @coderabbitai help me debug CodeRabbit configuration file.

Note: Be mindful of the bot's finite context window. It's strongly recommended to break down tasks such as reading entire modules into smaller chunks. For a focused discussion, use review comments to chat about specific files and their changes, instead of using the PR comments.

CodeRabbit Commands (Invoked using PR comments)

  • @coderabbitai pause to pause the reviews on a PR.
  • @coderabbitai resume to resume the paused reviews.
  • @coderabbitai review to trigger an incremental review. This is useful when automatic reviews are disabled for the repository.
  • @coderabbitai full review to do a full review from scratch and review all the files again.
  • @coderabbitai summary to regenerate the summary of the PR.
  • @coderabbitai resolve resolve all the CodeRabbit review comments.
  • @coderabbitai configuration to show the current CodeRabbit configuration for the repository.
  • @coderabbitai help to get help.

Other keywords and placeholders

  • Add @coderabbitai ignore anywhere in the PR description to prevent this PR from being reviewed.
  • Add @coderabbitai summary to generate the high-level summary at a specific location in the PR description.
  • Add @coderabbitai anywhere in the PR title to generate the title automatically.

CodeRabbit Configuration File (.coderabbit.yaml)

  • You can programmatically configure CodeRabbit by adding a .coderabbit.yaml file to the root of your repository.
  • Please see the configuration documentation for more information.
  • If your editor has YAML language server enabled, you can add the path at the top of this file to enable auto-completion and validation: # yaml-language-server: $schema=https://coderabbit.ai/integrations/schema.v2.json

Documentation and Community

  • Visit our Documentation for detailed information on how to use CodeRabbit.
  • Join our Discord Community to get help, request features, and share feedback.
  • Follow us on X/Twitter for updates and announcements.

@chronark
Copy link
Collaborator Author

going to self approve this, as the changes are mostly formatting :/

Copy link
Contributor

@coderabbitai coderabbitai bot left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Actionable comments posted: 4

🧹 Outside diff range and nitpick comments (42)
internal/clickhouse/schema/026_create_billable_ratelimits_v1.sql (2)

3-12: Consider adding constraints and optimizing table structure

While the basic table structure is correct, consider these improvements for better data integrity and performance:

  1. Add NOT NULL constraints to prevent null values
  2. Add CHECK constraints for year/month validity
  3. Consider partitioning by year for better query performance
  4. Consider adding a TTL policy if historical data isn't needed indefinitely
 CREATE TABLE billing.billable_ratelimits_per_month_v1
 (
-  year          Int,
-  month         Int,
-  workspace_id  String,
-  count         Int64
+  year          Int NOT NULL,
+  month         Int NOT NULL,
+  workspace_id  String NOT NULL,
+  count         Int64 NOT NULL,
+  CONSTRAINT valid_year CHECK (year >= 2020),
+  CONSTRAINT valid_month CHECK (month BETWEEN 1 AND 12)
 )
 ENGINE = SummingMergeTree()
+PARTITION BY year
 ORDER BY (workspace_id, year, month)
+TTL toDateTime('2020-01-01') + INTERVAL 3 YEAR
 ;

1-30: Consider data migration and consistency strategy

Since this is part of moving from TimescaleDB to ClickHouse:

  1. Ensure there's a data migration plan for existing rate limit data
  2. Consider running both systems in parallel initially to validate data consistency
  3. Plan for a gradual cutover strategy to minimize risk

Would you like help creating a data migration and validation plan?

internal/clickhouse/schema/025_create_billable_verifications_v2.sql (1)

2-11: Consider optimizing table definition for better performance

While the basic structure is correct, consider these improvements for better performance and data management:

  1. Add explicit partition by year for efficient data management
  2. Define TTL policy if historical data cleanup is needed
  3. Consider using more specific integer types (Int16) for year/month
 CREATE TABLE billing.billable_verifications_per_month_v2
 (
-  year          Int,
-  month         Int,
+  year          Int16,
+  month         Int16,
   workspace_id  String,
   count         Int64
 )
 ENGINE = SummingMergeTree()
+PARTITION BY year
 ORDER BY (workspace_id, year, month)
internal/clickhouse/src/billing.ts (1)

4-38: Implementation looks good with minor suggestions

The function is well-structured with proper:

  • Input validation using Zod
  • Error handling for null/undefined cases
  • Default values
  • Clear SQL query with proper parameter binding

Consider adding JSDoc comments to document:

  • The return type and its meaning
  • Whether the count represents unique rate limits or total occurrences

Apply this diff to improve documentation:

+/**
+ * Get the total number of billable rate limits for a workspace in a specific month
+ * @param ch - ClickHouse querier instance
+ * @returns Promise<number> - The total count of rate limits (0 if none found)
+ */
 export function getBillableRatelimits(ch: Querier) {
apps/billing/src/lib/db-marketing/schemas/evals.ts (3)

Line range hint 18-19: Consider using JSON column type and proper serialization.

The recommendations and outline fields are storing JSON data as TEXT with string defaults. This approach has several potential issues:

  1. No JSON validation at the database level
  2. Less efficient storage and querying compared to native JSON columns
  3. String literal defaults instead of proper JSON serialization

Consider these improvements:

- recommendations: text("recommendations").notNull().default("[]"),
- outline: text("outline").default("[]"),
+ recommendations: json("recommendations").notNull().$default(() => [] as const),
+ outline: json("outline").$default(() => [] as const),

This change would:

  • Ensure JSON validity at the database level
  • Enable better query performance for JSON operations
  • Properly serialize default values

Line range hint 39-65: Add validation schema for the outline field.

The new outline field lacks a corresponding Zod schema for validation. This could lead to type safety issues and inconsistent data.

Consider adding a validation schema:

export const outlineSchema = z.object({
  outline: z.array(
    z.object({
      // Define the expected structure of your outline items
      // Example:
      title: z.string(),
      content: z.string(),
      // Add more fields as needed
    })
  ),
});

Then extend the existing schemas:

 export const insertEvalSchema = createInsertSchema(evals)
-  .extend({})
+  .extend({
+    outline: outlineSchema.optional(),
+  })
   .omit({ id: true });

Missing Migration for evals Schema Changes.
The current migration strategy does not include the necessary changes for the evals schema in apps/billing/src/lib/db-marketing/schemas/evals.ts.

🔗 Analysis chain

Line range hint 1-65: Verify database migration strategy.

As part of moving queries from TB to CH (as per PR title), ensure these schema changes are properly handled in the migration process.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for related migration files
fd -e sql -e ts migration apps/billing/src/lib/db-marketing/migrations

# Look for any existing data transformation logic
rg -l "outline|recommendations" apps/billing/src/lib/db-marketing

Length of output: 323


Script:

#!/bin/bash
# Search for any migration files in the entire repository
fd -e sql -e ts migration

# Search for data transformation logic related to evals
rg -l "evals|evalTypes" .

Length of output: 760


Script:

#!/bin/bash
# Search for references to 'evals' within migration files
rg 'evals' $(fd -e ts migration)

Length of output: 478

apps/billing/src/lib/search-query.ts (3)

Line range hint 29-42: Consider improving the prompt engineering

The NOTE comment indicates known issues with the prompt quality. The current prompt might benefit from:

  1. More specific examples of good/bad queries
  2. Explicit constraints on query length
  3. Domain-specific context about API development

Would you like me to help generate an improved prompt structure?


Line range hint 45-71: Add retry logic and explicit error handling for database operations

The transaction could fail due to temporary database issues. Consider:

  1. Adding retry logic for transient failures
  2. Explicit error handling for transaction rollbacks
  3. Logging of failure scenarios

Here's a suggested improvement:

+ import { retry } from '@/lib/utils/retry';  // Add retry utility

- await db.transaction(async (tx) => {
+ await retry(async () => {
+   return db.transaction(async (tx) => {
      // Insert search query
      await tx
        .insert(searchQueries)
        .values({
          ...generatedQuery.object,
        })
        .onDuplicateKeyUpdate({
          set: {
            updatedAt: sql`now()`,
          },
        });

      // Insert entry
      await tx
        .insert(entries)
        .values({
          inputTerm: term,
        })
        .onDuplicateKeyUpdate({
          set: {
            updatedAt: sql`now()`,
          },
        });
+   });
+ }, {
+   retries: 3,
+   minTimeout: 1000,
+   factor: 2,
+ });

Unusual OpenAI model name "gpt-4o-mini" detected in multiple files:

  • apps/billing/src/trigger/glossary/generate-outline.ts
  • apps/billing/src/trigger/glossary/evals.ts
  • apps/billing/src/trigger/glossary/draft-sections.ts
  • apps/billing/src/lib/search-query.ts
  • apps/billing/src/lib/firecrawl.ts
  • apps/billing/src/lib/keywords.ts
🔗 Analysis chain

Line range hint 31-31: Verify the OpenAI model name

The model name "gpt-4o-mini" appears unusual and might be a typo. Standard OpenAI models typically follow patterns like "gpt-4", "gpt-3.5-turbo", etc.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other OpenAI model usages in the codebase
rg -g '!{*.lock,package-lock.json}' 'openai\(["\']gpt-' --type ts

Length of output: 130


Script:

#!/bin/bash
# Improved search for OpenAI model usages in the codebase, accounting for potential typos
rg -g '!{*.lock,package-lock.json}' 'openai\(["\']gpt-4[o0]-mini["\']' --type ts

Length of output: 93


Script:

#!/bin/bash
# Improved search for OpenAI model usages in the codebase, correcting quotation marks
rg -g '!{*.lock,package-lock.json}' 'openai\(["'\'']gpt-4[o0]-mini["'\'']\)' --type ts

Length of output: 867

apps/billing/src/lib/db-marketing/schemas/sections.ts (3)

Line range hint 42-51: Document content types and consider exporting the array

The content types array defines valid section types but lacks documentation explaining each type's purpose. Additionally, consider exporting this constant for reuse in other parts of the application.

Consider applying these improvements:

+/**
+ * Valid content types for sections:
+ * - listicle: Ordered or unordered lists of items
+ * - table: Tabular data presentation
+ * - image: Image content
+ * - code: Code snippets or examples
+ * - infographic: Visual information graphics
+ * - timeline: Time-based content presentation
+ * - other: Miscellaneous content types
+ * - text: Plain text content
+ * - video: Video content
+ */
-const contentTypes = [
+export const contentTypes = [
   "listicle",
   "table",
   // ... rest of the types
] as const;

Also applies to: 53-59


Line range hint 82-85: Standardize timestamp precision across tables

The timestamps in sectionsToKeywords use microsecond precision (fsp: 3), while other tables use default precision. Consider standardizing this across all tables for consistency.


Database migration strategy not fully implemented

The table sectionsToKeywords is still using MySQL schema definitions. Consider updating it to ClickHouse-specific schema definitions to align with the migration objective.

🔗 Analysis chain

Line range hint 78-91: Verify database migration strategy

Given that this PR's objective is to move queries from TB to CH (TimescaleDB to ClickHouse), please verify if this table should be using ClickHouse-specific schema definitions instead of MySQL.

Run this script to check for other ClickHouse schema definitions in the codebase:

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Look for ClickHouse schema definitions and verify consistency

# Search for ClickHouse table definitions
rg -l "clickhouse.*Table|createTable.*clickhouse" --type ts

# Search for similar junction tables in ClickHouse
rg "many-to-many|junction.*table|.*ToKeywords" --type ts

Length of output: 2613

apps/billing/src/trigger/glossary/keyword-research.ts (6)

1-11: Consider organizing imports for better maintainability

Consider grouping imports into sections:

  1. External dependencies (trigger.dev, drizzle-orm)
  2. Internal shared utilities
  3. Local module imports
+ // External dependencies
import { AbortTaskRunError, task } from "@trigger.dev/sdk/v3";
import { sql, inArray, and, eq } from "drizzle-orm";

+ // Internal shared utilities
import { db } from "@/lib/db-marketing/client";
import { keywords } from "@/lib/db-marketing/schemas";
import { getOrCreateFirecrawlResponse } from "@/lib/firecrawl";

+ // Local imports
import { getOrCreateKeywordsFromHeaders, getOrCreateKeywordsFromTitles } from "../../lib/keywords";
import { getOrCreateSearchQuery } from "../../lib/search-query";
import { getOrCreateSearchResponse } from "../../lib/serper";
import type { CacheStrategy } from "./_generate-glossary-entry";

Line range hint 13-19: Consider a more descriptive constant name

The constant THREE could be more descriptively named to indicate its purpose, such as TOP_RESULTS_LIMIT or MAX_SEARCH_RESULTS.

-export const THREE = 3;
+export const TOP_RESULTS_LIMIT = 3;

Line range hint 24-35: Consider enhancing cache invalidation logic

The current caching strategy only checks for existence but doesn't consider the age of the cached data. Consider adding a timestamp-based invalidation strategy.

 if (existing.length > 0 && onCacheHit === "stale") {
+  const oldestAllowedTimestamp = new Date(Date.now() - 7 * 24 * 60 * 60 * 1000); // 7 days
+  const isStale = existing.some(entry => entry.updatedAt < oldestAllowedTimestamp);
+  if (!isStale) {
     return {
       message: `Found existing keywords for ${term}`,
       term,
       keywords: existing,
     };
+  }
 }

Line range hint 47-57: Add timeout handling for external API calls

The external API calls to fetch search results and content could benefit from timeout handling to prevent hanging operations.

 const topThree = searchResponse.serperOrganicResults
   .sort((a, b) => a.position - b.position)
   .slice(0, THREE);

+const FETCH_TIMEOUT = 30000; // 30 seconds
+
+const timeoutPromise = (promise: Promise<any>, ms: number) => {
+  return Promise.race([
+    promise,
+    new Promise((_, reject) => 
+      setTimeout(() => reject(new Error('Operation timed out')), ms)
+    )
+  ]);
+};
+
 // Get content for top 3 results
 const firecrawlResults = await Promise.all(
   topThree.map((result) =>
-    getOrCreateFirecrawlResponse({ url: result.link, connectTo: { term: term } }),
+    timeoutPromise(
+      getOrCreateFirecrawlResponse({ url: result.link, connectTo: { term: term } }),
+      FETCH_TIMEOUT
+    ),
   ),
 );

Line range hint 74-94: Wrap database operations in a transaction

The related search keywords insertion should be wrapped in a transaction to ensure data consistency.

+const insertRelatedSearches = async (searchQuery, searches) => {
+  return await db.transaction(async (tx) => {
     await tx
       .insert(keywords)
       .values(
         searches.map((search) => ({
           inputTerm: searchQuery.inputTerm,
           keyword: search.query.toLowerCase(),
           source: "related_searches",
           updatedAt: sql`now()`,
         })),
       )
       .onDuplicateKeyUpdate({
         set: {
           updatedAt: sql`now()`,
         },
       });
     
     return await tx.query.keywords.findMany({
       where: and(
         eq(keywords.inputTerm, searchQuery.inputTerm),
         eq(keywords.source, "related_searches"),
         inArray(
           keywords.keyword,
           searches.map((search) => search.query.toLowerCase()),
         ),
       ),
     });
+  });
+};

-await db.insert(keywords)...
-const insertedRelatedSearches = await db.query.keywords...
+const insertedRelatedSearches = await insertRelatedSearches(
+  searchQuery,
+  searchResponse.serperRelatedSearches
+);

Line range hint 96-106: Enhance error handling and result validation

Consider adding validation for the final results before returning and improve error handling for better debugging.

+import { z } from "zod";
+
+const KeywordResultSchema = z.object({
+  message: z.string(),
+  term: z.string(),
+  keywords: z.array(z.object({
+    keyword: z.string(),
+    source: z.string(),
+    inputTerm: z.string(),
+    updatedAt: z.date()
+  }))
+});
+
+try {
   const result = {
     message: `Keyword Research for ${term} completed`,
     term: searchQuery.inputTerm,
     keywords: [...keywordsFromTitles, ...keywordsFromHeaders, ...insertedRelatedSearches],
   };
+  return KeywordResultSchema.parse(result);
+} catch (error) {
+  console.error('Failed to validate keyword research results:', error);
+  throw new Error('Invalid keyword research results');
+}
-  return result;
apps/dashboard/lib/audit.ts (2)

55-92: Add JSDoc comments to improve type documentation

The UnkeyAuditLog type would benefit from documentation explaining its purpose and usage.

+/**
+ * Represents an audit log entry in the Unkey system.
+ * @property workspaceId - The ID of the workspace where the action occurred
+ * @property event - The type of event that occurred
+ * @property resources - Array of resources affected by the action
+ */
 export type UnkeyAuditLog = {

Line range hint 93-183: Add error handling and consider batch operations

Two suggestions to improve the implementation:

  1. Add error handling for database operations
  2. Consider using batch inserts for better performance when handling multiple logs
 export async function insertAuditLogs(
   db: Transaction | Database,
   logOrLogs: MaybeArray<UnkeyAuditLog>,
 ) {
   const logs = Array.isArray(logOrLogs) ? logOrLogs : [logOrLogs];
 
   if (logs.length === 0) {
     return Promise.resolve();
   }
 
+  try {
     for (const log of logs) {
       // ... bucket handling ...
     }
 
-    // 2. Insert the log
-    const auditLogId = newId("auditLog");
-    await db.insert(schema.auditLog).values({
+    // 2. Batch insert logs
+    const auditLogValues = logs.map(log => ({
+      id: newId("auditLog"),
       workspaceId: log.workspaceId,
       // ... other fields ...
     }));
+    await db.insert(schema.auditLog).values(auditLogValues);
 
     // ... resource handling ...
+  } catch (error) {
+    console.error("Failed to insert audit logs:", error);
+    throw new Error("Failed to record audit logs");
+  }
 }
apps/billing/src/trigger/glossary/create-pr.ts (3)

Line range hint 82-83: Consider moving repository details to configuration

The hardcoded owner and repo values should be moved to environment variables or configuration for better maintainability and reusability.

-    const owner = "p6l-richard";
-    const repo = "unkey";
+    const owner = process.env.GITHUB_REPO_OWNER;
+    const repo = process.env.GITHUB_REPO_NAME;

Line range hint 87-117: Add error handling for GitHub API operations

The branch creation and deletion operations lack proper error handling. Consider wrapping these operations in try/catch blocks to handle potential API failures gracefully.

Example implementation:

-    const mainRef = await octokit.git.getRef({
-      owner,
-      repo,
-      ref: "heads/main",
-    });
+    try {
+      const mainRef = await octokit.git.getRef({
+        owner,
+        repo,
+        ref: "heads/main",
+      });
+    } catch (error) {
+      throw new AbortTaskRunError(
+        `Failed to get main branch reference: ${error.message}`
+      );
+    }

Line range hint 19-33: Optimize database queries

The code makes multiple separate queries to fetch the same entry. Consider consolidating these queries to improve performance.

-    const existing = await db.query.entries.findFirst({
-      where: eq(entries.inputTerm, input),
-      columns: {
-        id: true,
-        inputTerm: true,
-        githubPrUrl: true,
-      },
-      orderBy: (entries, { desc }) => [desc(entries.createdAt)],
-    });
-
-    if (existing?.githubPrUrl && onCacheHit === "stale") {
-      return {
-        entry: existing,
-        message: `Found existing PR for ${input}.mdx`,
-      };
-    }
-
-    // ==== 1. Prepare MDX file ====
-    const entry = await db.query.entries.findFirst({
+    const entry = await db.query.entries.findFirst({
       where: eq(entries.inputTerm, input),
+      columns: {
+        id: true,
+        inputTerm: true,
+        githubPrUrl: true,
+        dynamicSectionsContent: true,
+        metaTitle: true,
+        metaDescription: true,
+      },
       orderBy: (entries, { desc }) => [desc(entries.createdAt)],
     });

+    if (entry?.githubPrUrl && onCacheHit === "stale") {
+      return {
+        entry: {
+          id: entry.id,
+          inputTerm: entry.inputTerm,
+          githubPrUrl: entry.githubPrUrl,
+        },
+        message: `Found existing PR for ${input}.mdx`,
+      };
+    }
apps/billing/src/trigger/glossary/_generate-glossary-entry.ts (1)

Line range hint 35-37: Resolve inconsistency in retry configuration.

The task's maxAttempts is set to 0, but the documentation comments mention "maxAttempts for a subtask is 5". This seems inconsistent and might prevent the intended retry behavior for handling undeterministic LLM executions.

Consider updating the retry configuration:

  retry: {
-   maxAttempts: 0,
+   maxAttempts: 5,
  },
apps/billing/src/lib/firecrawl.ts (2)

Line range hint 9-11: Add environment variable validation

The Firecrawl API key is accessed with a non-null assertion (!), which could cause runtime errors if the environment variable is missing.

Consider adding validation:

+if (!process.env.FIRECRAWL_API_KEY) {
+  throw new Error("FIRECRAWL_API_KEY environment variable is required");
+}
 const firecrawl = new FirecrawlApp({
-  apiKey: process.env.FIRECRAWL_API_KEY!,
+  apiKey: process.env.FIRECRAWL_API_KEY,
 });

Issue Found: Potential Typo in OpenAI Model Name

  • The model name "gpt-4o-mini" appears multiple times and may be a typo (e.g., "gpt-4-turbo").
  • Inconsistent model names like "gpt-4-turbo" and "gpt-4o" are also present.
🔗 Analysis chain

Line range hint 171-175: Verify the model name and consider extracting the system prompt

  1. The model name "gpt-4o-mini" seems unusual and might be a typo.
  2. The long system prompt could be moved to a constant for better maintainability.

Let's verify the model name usage:

Consider moving the system prompt to a constant:

const SUMMARY_SYSTEM_PROMPT = `You are the **Chief Technology Officer (CTO)** ...`;
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for OpenAI model names in the codebase
rg "openai\(['\"]gpt-" --type typescript

Length of output: 79


Script:

#!/bin/bash
# Search for OpenAI model names in the codebase with TypeScript extensions
rg "openai\(['\"]gpt-" --type ts --type tsx

Length of output: 75


Script:

#!/bin/bash
# Search for OpenAI model names in the codebase within .ts and .tsx files
rg "openai\(['\"]gpt-" --glob "*.ts" --glob "*.tsx"

Length of output: 1002

apps/billing/src/lib/keywords.ts (4)

Line range hint 83-95: Consider enhancing error handling for the OpenAI API call.

The generateObject call could fail due to API errors, rate limits, or invalid responses. Consider adding specific error handling:

   const keywordsFromTitles = await generateObject({
     model: openai("gpt-4o-mini"),
     system: keywordResearchSystemPrompt,
     prompt: promptTitles,
     schema: z.object({
       keywords: z.array(z.object({ keyword: z.string(), sourceUrl: z.string().url() })),
       keywordsWithBrandNames: z.array(
         z.object({ keyword: z.string(), sourceUrl: z.string().url() }),
       ),
-    }),
+    }).catch((error) => {
+      if (error.code === 'rate_limit_exceeded') {
+        throw new Error(`OpenAI rate limit exceeded: ${error.message}`);
+      }
+      throw new Error(`Failed to generate keywords: ${error.message}`);
+    });

Line range hint 98-111: Consider wrapping database operations in a transaction.

The database operations (insert and subsequent query) should be wrapped in a transaction to ensure data consistency.

+  return await db.transaction(async (tx) => {
     await db
       .insert(keywords)
       .values(
         keywordsFromTitles.object.keywords.map((keyword) => ({
           inputTerm: term,
           keyword: keyword.keyword.toLowerCase(),
           sourceUrl: keyword.sourceUrl,
           source: "titles",
         })),
       )
       .onDuplicateKeyUpdate({
         set: {
           updatedAt: sql`now()`,
         },
       });
 
-    return db.query.keywords.findMany({
+    return tx.query.keywords.findMany({
       where: and(
         eq(keywords.inputTerm, term),
         eq(keywords.source, "titles"),
         inArray(
           keywords.keyword,
           keywordsFromTitles.object.keywords.map((k) => k.keyword.toLowerCase()),
         ),
       ),
     });
+  });

Line range hint 134-137: Consider improving the markdown header parsing regex.

The current regex ^##\s+(.*)$/gm might miss some edge cases. Consider using a more robust pattern:

-      ${firecrawlResponse.markdown?.match(/^##\s+(.*)$/gm)?.join("\n")}
+      ${firecrawlResponse.markdown?.match(/^#{1,6}\s+(.+?)(?:\s*{[^}]*})?$/gm)?.join("\n")}

This pattern will:

  • Match headers of all levels (h1-h6)
  • Handle optional trailing attributes in curly braces
  • Ensure non-empty header content

Line range hint 115-190: Consider reducing code duplication with getOrCreateKeywordsFromTitles.

Both functions share similar database operations and OpenAI call patterns. Consider extracting common logic into shared utilities:

interface KeywordExtractionParams {
  term: string;
  source: 'titles' | 'headers';
  prompt: string;
}

async function extractAndStoreKeywords({ term, source, prompt }: KeywordExtractionParams) {
  const keywords = await generateObject({
    model: openai("gpt-4o-mini"),
    system: keywordResearchSystemPrompt,
    prompt,
    schema: z.object({
      keywords: z.array(z.object({ keyword: z.string(), sourceUrl: z.string().url() })),
      keywordsWithBrandNames: z.array(
        z.object({ keyword: z.string(), sourceUrl: z.string().url() }),
      ),
    }),
  });

  return db.transaction(async (tx) => {
    // ... shared database operations
  });
}
apps/billing/src/trigger/glossary/draft-sections.ts (3)

Line range hint 15-106: Consider implementing rate limiting and error handling for AI operations.

The task uses multiple OpenAI API calls in sequence without explicit rate limiting or comprehensive error handling. This could lead to reliability issues or unexpected costs.

Consider:

  1. Implementing a rate limiter for OpenAI API calls
  2. Adding retry logic with exponential backoff
  3. Validating AI response length and structure
  4. Monitoring and logging AI operation costs

Line range hint 107-190: Refactor AI model configuration for better maintainability.

The code has hardcoded model names (gpt-4-turbo, gpt-4o-mini) scattered across different functions. This could make model updates or A/B testing difficult.

Consider extracting model configuration to a central configuration:

const AI_CONFIG = {
  models: {
    draft: "gpt-4-turbo",
    review: "gpt-4o-mini",
    seo: "gpt-4o-mini"
  },
  maxTokens: {
    draft: 1000,
    review: 500,
    seo: 800
  }
} as const;

// Usage
const completion = await generateText({
  model: openai(AI_CONFIG.models.draft),
  maxTokens: AI_CONFIG.maxTokens.draft,
  system,
  prompt,
});

Line range hint 191-236: Consider extracting complex types for better maintainability.

The nested type definition for the entry parameter is complex and could be hard to maintain. Consider extracting it to a separate type definition.

type DynamicSection = typeof sections.$inferSelect & {
  contentTypes: Array<typeof sectionContentTypes.$inferSelect>;
  sectionsToKeywords: Array<
    typeof sectionsToKeywords.$inferSelect & {
      keyword: typeof keywords.$inferSelect;
    }
  >;
};

type EntryWithDynamicSections = typeof entries.$inferSelect & {
  dynamicSections: Array<DynamicSection>;
};

// Usage
function draftSections({
  term,
  entry,
}: {
  term: string;
  entry: EntryWithDynamicSections;
})
apps/dashboard/app/(app)/apis/[apiId]/page.tsx (3)

Line range hint 71-108: Consider adding error handling for unknown verification outcomes

The switch statement for processing verification outcomes has a default case that silently ignores unknown outcomes. This could hide potential issues if new outcome types are added in the future.

      case "FORBIDDEN":
        forbiddenOverTime.push({ x, y: d.count });
        break;
      default:
+       console.warn(`Unknown verification outcome encountered: ${d.outcome}`);
+       break;

Line range hint 28-41: Add error handling for missing keyAuthId

The code assumes api.keyAuthId exists (using ! operator) but should handle the case where it might be null more gracefully.

   const api = await db.query.apis.findFirst({
     where: (table, { eq, and, isNull }) =>
       and(eq(table.id, props.params.apiId), isNull(table.deletedAt)),
     with: {
       workspace: true,
     },
   });
-  if (!api || api.workspace.tenantId !== tenantId) {
+  if (!api || api.workspace.tenantId !== tenantId || !api.keyAuthId) {
     return redirect("/new");
   }

Line range hint 42-108: Consider memoizing verification data processing

The processing of verification data into different categories is performed on every render. Consider memoizing this computation using useMemo if this component is re-rendered frequently.

Here's how you could refactor this:

const processVerifications = useMemo(() => {
  const result = {
    successOverTime: [],
    ratelimitedOverTime: [],
    // ... other arrays
  };
  
  for (const d of verifications.sort((a, b) => a.time - b.time)) {
    const x = new Date(d.time).toISOString();
    switch (d.outcome) {
      case "":
      case "VALID":
        result.successOverTime.push({ x, y: d.count });
        break;
      // ... other cases
    }
  }
  
  return result;
}, [verifications]);
apps/billing/src/trigger/glossary/evals.ts (1)

Line range hint 158-321: Standardize logging format across tasks

The editorial evaluation task uses a different logging format ([workflow=glossary] [task=editorial_eval]) compared to other tasks. Consider standardizing the logging format across all tasks for consistency.

Consider abstracting common task patterns

The three evaluation tasks share significant code structure. Consider creating a base evaluation task factory to reduce code duplication.

Here's a suggested approach:

type EvalType = 'technical' | 'seo' | 'editorial';

const createEvalTask = (type: EvalType) => {
  return task({
    id: `perform_${type}_eval`,
    run: async ({ input, onCacheHit = "stale", ...options }: TaskInput & EvalOptions) => {
      console.info(`Starting ${type} evaluation for term: ${input}`);

      // ... common implementation ...

      const result = {
        ratings: ratingsResult.output,
        recommendations: recommendationsResult.output.recommendations,
      };

      // Add outline for editorial type
      if (type === 'editorial') {
        return { ...result, outline: options.content };
      }

      return result;
    }
  });
};

export const performTechnicalEvalTask = createEvalTask('technical');
export const performSEOEvalTask = createEvalTask('seo');
export const performEditorialEvalTask = createEvalTask('editorial');
apps/billing/src/trigger/glossary/generate-outline.ts (3)

Line range hint 26-29: Address TODO comment with concrete improvements

The TODO comment indicates known issues but doesn't have a clear timeline or ownership. Consider breaking this into specific GitHub issues for:

  1. Task decomposition
  2. Database caching implementation
  3. Prompt improvements

Would you like me to help create GitHub issues for tracking these improvements?


Line range hint 26-196: Consider refactoring the task for better maintainability

The generateOutlineTask is quite large and handles multiple concerns. Consider:

  1. Breaking it into smaller, focused tasks
  2. Wrapping database operations in transactions
  3. Implementing proper rollback mechanisms for partial failures

Here's a suggested approach for breaking down the task:

// 1. Create separate tasks for each major step
export const generateInitialOutlineTask = task({/*...*/})
export const performEvaluationsTask = task({/*...*/})
export const persistOutlineTask = task({/*...*/})

// 2. Wrap database operations in transactions
async function persistOutline(entry, outline) {
  return await db.transaction(async (tx) => {
    const sectionIds = await tx.insert(sections)/*...*/
    await tx.insert(sectionsToKeywords)/*...*/
    await tx.insert(sectionContentTypes)/*...*/
    return sectionIds
  })
}

// 3. Main task becomes an orchestrator
export const generateOutlineTask = task({
  id: "generate_outline",
  run: async ({ term, onCacheHit }) => {
    const initialOutline = await generateInitialOutlineTask.triggerAndWait(/*...*/)
    const evaluations = await performEvaluationsTask.triggerAndWait(/*...*/)
    return await persistOutlineTask.triggerAndWait(/*...*/)
  }
})

Custom model gpt-4o-mini is used without proper configuration.

  • Ensure that gpt-4o-mini is defined and configured correctly in the production environment.
  • Review the configuration files to include settings for gpt-4o-mini.
🔗 Analysis chain

Line range hint 213-350: Verify the availability and capabilities of the custom OpenAI model

The code uses a custom model gpt-4o-mini. Ensure that:

  1. This model is properly configured and available in production
  2. It has sufficient context window for processing the input content
  3. It's cost-effective for the use case
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for other usages of this custom model
rg -l "gpt-4o-mini"

# Check for any configuration files that might define the model
fd -e json -e yaml -e env . | xargs rg "gpt-4o"

Length of output: 371


Script:

#!/bin/bash
# Check for any configuration files that define the 'gpt-4o-mini' model
fd -e json -e yaml -e env . | xargs rg "gpt-4o-mini"

Length of output: 52

apps/dashboard/lib/trpc/routers/rbac.ts (1)

Line range hint 270-313: Consider renaming upsertPermissions to better reflect its functionality

The function upsertPermissions currently inserts new permissions if they do not exist but does not update existing permissions. Since it doesn't perform an update on existing records, the term upsert might be misleading. To improve clarity, consider renaming the function to something like createPermissionsIfNotExist or ensurePermissionsExist.

📜 Review details

Configuration used: CodeRabbit UI
Review profile: CHILL

📥 Commits

Reviewing files that changed from the base of the PR and between ba369f9 and db1519c.

⛔ Files ignored due to path filters (1)
  • pnpm-lock.yaml is excluded by !**/pnpm-lock.yaml
📒 Files selected for processing (32)
  • apps/billing/src/lib/db-marketing/schemas/entries.ts (1 hunks)
  • apps/billing/src/lib/db-marketing/schemas/evals.ts (1 hunks)
  • apps/billing/src/lib/db-marketing/schemas/firecrawl.ts (2 hunks)
  • apps/billing/src/lib/db-marketing/schemas/keywords.ts (1 hunks)
  • apps/billing/src/lib/db-marketing/schemas/searchQuery.ts (1 hunks)
  • apps/billing/src/lib/db-marketing/schemas/sections.ts (1 hunks)
  • apps/billing/src/lib/db-marketing/schemas/serper.ts (1 hunks)
  • apps/billing/src/lib/firecrawl.ts (2 hunks)
  • apps/billing/src/lib/keywords.ts (1 hunks)
  • apps/billing/src/lib/search-query.ts (1 hunks)
  • apps/billing/src/trigger/glossary/_generate-glossary-entry.ts (1 hunks)
  • apps/billing/src/trigger/glossary/create-pr.ts (1 hunks)
  • apps/billing/src/trigger/glossary/draft-sections.ts (1 hunks)
  • apps/billing/src/trigger/glossary/evals.ts (1 hunks)
  • apps/billing/src/trigger/glossary/generate-outline.ts (2 hunks)
  • apps/billing/src/trigger/glossary/keyword-research.ts (1 hunks)
  • apps/billing/src/trigger/glossary/seo-meta-tags.ts (1 hunks)
  • apps/dashboard/app/(app)/apis/[apiId]/page.tsx (1 hunks)
  • apps/dashboard/app/(app)/semantic-cache/[gatewayId]/analytics/page.tsx (2 hunks)
  • apps/dashboard/app/(app)/semantic-cache/[gatewayId]/layout.tsx (0 hunks)
  • apps/dashboard/app/(app)/semantic-cache/[gatewayId]/logs/page.tsx (0 hunks)
  • apps/dashboard/app/(app)/semantic-cache/[gatewayId]/logs/table.tsx (0 hunks)
  • apps/dashboard/app/(app)/semantic-cache/[gatewayId]/logs/util.ts (0 hunks)
  • apps/dashboard/app/(app)/settings/billing/page.tsx (1 hunks)
  • apps/dashboard/lib/audit.ts (1 hunks)
  • apps/dashboard/lib/tinybird.ts (0 hunks)
  • apps/dashboard/lib/trpc/routers/key/createRootKey.ts (1 hunks)
  • apps/dashboard/lib/trpc/routers/rbac.ts (1 hunks)
  • internal/clickhouse/schema/025_create_billable_verifications_v2.sql (1 hunks)
  • internal/clickhouse/schema/026_create_billable_ratelimits_v1.sql (1 hunks)
  • internal/clickhouse/src/billing.ts (2 hunks)
  • internal/clickhouse/src/index.ts (2 hunks)
💤 Files with no reviewable changes (5)
  • apps/dashboard/app/(app)/semantic-cache/[gatewayId]/layout.tsx
  • apps/dashboard/app/(app)/semantic-cache/[gatewayId]/logs/page.tsx
  • apps/dashboard/app/(app)/semantic-cache/[gatewayId]/logs/table.tsx
  • apps/dashboard/app/(app)/semantic-cache/[gatewayId]/logs/util.ts
  • apps/dashboard/lib/tinybird.ts
✅ Files skipped from review due to trivial changes (6)
  • apps/billing/src/lib/db-marketing/schemas/entries.ts
  • apps/billing/src/lib/db-marketing/schemas/firecrawl.ts
  • apps/billing/src/lib/db-marketing/schemas/keywords.ts
  • apps/billing/src/lib/db-marketing/schemas/searchQuery.ts
  • apps/billing/src/lib/db-marketing/schemas/serper.ts
  • apps/dashboard/app/(app)/semantic-cache/[gatewayId]/analytics/page.tsx
🔇 Additional comments (35)
internal/clickhouse/schema/026_create_billable_ratelimits_v1.sql (2)

29-30: LGTM! Down migration is correct

The down migration properly handles the cleanup in the correct order.


14-27: Verify source table schema and time handling

The materialized view looks functionally correct, but please verify:

  1. The schema of ratelimits.ratelimits_per_month_v1 matches expectations
  2. The time zone handling for toYear(time) and toMonth(time) is appropriate
internal/clickhouse/schema/025_create_billable_verifications_v2.sql (3)

29-30: LGTM! Down migration is correct

The down migration properly handles cleanup in the correct order (view before table).


13-26: Verify source table schema and consider performance optimizations

The materialized view looks functionally correct, but let's verify a few things:

Additionally, consider adding an index on the outcome column in the source table if not already present, as it's used in the WHERE clause.


1-30: Verify migration sequence and differences from v1

Let's ensure this migration is properly sequenced and understand the changes from v1.

✅ Verification successful

Migration sequence verified and differences from v1 are consistent

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Check migration ordering and compare with v1

# Check for previous versions
fd "billable_verifications.*\.sql" internal/clickhouse/schema/

# Look for related tables/views
rg "billable_verifications" internal/clickhouse/schema/

Length of output: 1831


Script:

#!/bin/bash
# Description: Corrected search for 'billable_verifications' within the clickhouse schema

# Look for related tables/views with the correct directory path
rg "billable_verifications" internal/clickhouse/schema/

Length of output: 1414


Script:

#!/bin/bash
# Description: Retrieve contents of v1 and v2 billable_verifications migration scripts

# Get contents of v1 table creation script
cat internal/clickhouse/schema/012_create_billing_billable_verifications_per_month_v1.sql

# Get contents of v1 materialized view creation script
cat internal/clickhouse/schema/013_create_billing_billable_verifications_per_month_mv_v1.sql

# Get contents of v2 migration script
cat internal/clickhouse/schema/025_create_billable_verifications_v2.sql

Length of output: 1691

internal/clickhouse/src/billing.ts (2)

51-51: Verify migration strategy for v2 table

The change from billable_verifications_per_month_v1 to v2 might require data migration. Please ensure:

  1. Data from v1 is migrated to v2 if needed
  2. No data loss during the transition
  3. Backward compatibility if both versions need to coexist

Let's check for migration scripts and table definitions:


16-16: Consider using v2 table for consistency

The function queries billing.billable_ratelimits_per_month_v1 while getBillableVerifications uses a v2 table. Should this also be using a v2 table for consistency?

Let's check if a v2 table exists:

✅ Verification successful

Confirmed: Using v1 table is correct

The billing.billable_ratelimits_per_month_v1 table is the current version. No v2 table exists.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for billable_ratelimits table definitions
rg -A 5 "CREATE TABLE.*billable_ratelimits.*v2"

Length of output: 49


Script:

#!/bin/bash
# List all billable_ratelimits table definitions
rg "CREATE TABLE.*billable_ratelimits"

# Search for migrations introducing v2 tables
rg "migration.*v2.*billable_ratelimits"

Length of output: 201

internal/clickhouse/src/index.ts (2)

2-2: LGTM: Clean import addition

The import statement correctly adds the new getBillableRatelimits function alongside the existing getBillableVerifications.


68-68: Verify API consumers are updated

The implementation cleanly adds rate limit billing capabilities following the established pattern. However, we should ensure all consumers of the billing API are updated to handle this new property.

apps/billing/src/lib/db-marketing/schemas/evals.ts (1)

1-2: LGTM! Import organization looks good.

The separation of imports improves readability and git diff clarity.

apps/billing/src/lib/search-query.ts (2)

6-6: LGTM: Import organization looks good

The reordering of imports maintains clarity and follows standard practices.


Line range hint 1-1: Verify database client usage

Given that the PR's objective is to move queries from "tb" to "ch" (ClickHouse), verify if this file should also be updated to use the ClickHouse client instead of the marketing database client.

#!/bin/bash
# Search for ClickHouse client usage patterns
rg -g '*.ts' 'from ["'\'']@/lib/(?:ch|clickhouse)' -A 5

# Search for similar files that might have been migrated
fd -e ts -e tsx -x grep -l "db-marketing" {} \; | xargs rg 'clickhouse'
apps/billing/src/lib/db-marketing/schemas/sections.ts (1)

Line range hint 16-29: Verify the nullable status of the markdown field

The markdown field is nullable while description is required. Please confirm if this is intentional, as typically markdown content would be the primary content source.

Consider making both fields required or document why markdown is optional while description is mandatory.

apps/dashboard/lib/audit.ts (2)

8-53: LGTM! Well-structured schema definition

The schema effectively validates and transforms audit log data with proper typing. The transformation to a more structured object format improves type safety and usability.


Line range hint 1-183: Verify database migration compatibility

The PR title mentions moving queries from TimescaleDB to ClickHouse, but this file doesn't explicitly show the database transition. Ensure that:

  1. The schema definitions are compatible with ClickHouse
  2. The data types are properly mapped
  3. The existing data is migrated correctly
✅ Verification successful

Database migration handled correctly

The presence of ClickHouse-specific configurations and migration scripts indicates that the migration from TimescaleDB to ClickHouse is being managed appropriately.

  • Migration scripts are present in the repository.
  • Schema definitions are compatible with ClickHouse.
🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Check for ClickHouse-specific types or configurations
rg -l 'clickhouse|ClickHouse' .

# Look for migration scripts
fd -g '*migrate*.{ts,js,sql}'

# Check for other files using similar schema
ast-grep --pattern 'schema.auditLog'

Length of output: 14050

apps/billing/src/trigger/glossary/create-pr.ts (2)

3-4: LGTM! Import statements are well organized.

The reordering of imports maintains a logical grouping while keeping the code functionality intact.


Line range hint 92-117: Handle potential race condition in branch management

The check-then-act pattern when handling existing branches could lead to race conditions. Consider using a more atomic approach or adding retry logic.

apps/billing/src/trigger/glossary/_generate-glossary-entry.ts (1)

1-2: LGTM! Import statements are well organized.

The reorganization of imports improves code readability by grouping related imports together (database-related and task-related imports).

Also applies to: 6-10

apps/billing/src/trigger/glossary/seo-meta-tags.ts (3)

3-7: LGTM! Import statements are well-organized.

The imports are logically grouped and provide clear dependencies for the functionality.


Line range hint 1-100: Consider migrating database queries to ClickHouse

Given the PR objective to move queries from TimescaleDB to ClickHouse, this file contains several database operations that could be candidates for migration:

  1. Querying existing entries
  2. Fetching related keywords
  3. Fetching top-ranking pages
  4. Updating entries with new meta tags

Let's verify the database schema migrations and existing query patterns:

#!/bin/bash
# Description: Check for ClickHouse schema definitions and query patterns

# Look for ClickHouse schema definitions
echo "Searching for ClickHouse schema definitions..."
rg -l "createTable.*ClickHouse" 

# Look for similar query patterns that have been migrated
echo "Searching for similar query patterns that have been migrated..."
rg "db\.query" --type ts

# Look for migration patterns
echo "Searching for migration patterns..."
fd -e sql -e ts "migrate|schema" 

Line range hint 22-33: Consider implementing rate limiting for OpenAI calls

The task uses GPT-4 which can be expensive. Consider implementing rate limiting to control costs.

Let's check for existing rate limiting implementations:

apps/billing/src/lib/firecrawl.ts (1)

Line range hint 19-116: Consider refactoring for better error handling and DRY principles

The function has several areas that could be improved:

  1. Error handling could be more specific about different types of errors (network, parsing, etc.).
  2. Database operations are duplicated across success/failure paths.
  3. The success/failure paths could be more clearly separated.

Consider extracting the database operations into a helper function:

async function upsertFirecrawlResponse(data: Partial<typeof firecrawlResponses.$inferInsert>) {
  const [response] = await db
    .insert(firecrawlResponses)
    .values(data)
    .onDuplicateKeyUpdate({
      set: {
        ...data,
        updatedAt: new Date(),
      },
    })
    .$returningId();
  
  return await db.query.firecrawlResponses.findFirst({
    where: eq(firecrawlResponses.sourceUrl, data.sourceUrl),
  });
}

This would reduce duplication and make the main function cleaner.

Let's verify the error handling patterns in similar functions:

apps/billing/src/lib/keywords.ts (2)

1-7: LGTM! Import statements are well organized.

The imports are logically grouped with AI-related imports at the top, followed by database and schema-related imports.


Line range hint 83-89: Verify the OpenAI model alias exists.

The code uses gpt-4o-mini which appears to be a custom model alias. Ensure this model alias is properly configured in your OpenAI client setup.

apps/billing/src/trigger/glossary/draft-sections.ts (1)

3-7: LGTM! Import changes improve type safety.

The reordering of imports and addition of proper type imports from the schema enhance type safety and maintainability.

apps/dashboard/app/(app)/apis/[apiId]/page.tsx (1)

Line range hint 42-58: Verify complete migration of queries to ClickHouse

The PR's objective is to move queries from "tb" to "ch". While this component already uses ClickHouse for verification and active key metrics, let's verify no queries remain to be migrated.

✅ Verification successful

Complete migration of queries to ClickHouse confirmed.

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any remaining database queries that might need migration
# Look for patterns indicating traditional database usage

# Check for direct database queries
rg -l 'from\s+schema\.' --type typescript

# Check for specific table access patterns
rg -l 'schema\.(keys|verifications|active_keys)' --type typescript

# Look for potential migration TODOs
rg -l 'TODO.*migrate.*to.*ch' --type typescript

Length of output: 267


Script:

#!/bin/bash
# Search for any remaining database queries that might need migration
# Look for patterns indicating traditional database usage

# Check for direct database queries
rg -l 'from\s+schema\.' --type ts

# Check for specific table access patterns
rg -l 'schema\.(keys|verifications|active_keys)' --type ts

# Look for potential migration TODOs
rg -l 'TODO.*migrate.*to.*ch' --type ts

Length of output: 3742


Script:

#!/bin/bash
# Search for any remaining "tb" database queries that might need migration to "ch"

# Check for Teradata-specific schema references
rg -l 'schema_tb\.' --type ts

# Check for any connection strings or configurations referencing "tb"
rg -i -l 'connect.*tb' --type ts

# Check for comments or code indicating use of "tb"
rg -i -l 'use\s+tb' --type ts

Length of output: 98

apps/billing/src/trigger/glossary/evals.ts (4)

1-11: LGTM: Import section is well-organized

The import statements are properly organized and all imports are being used in the code.


Line range hint 14-27: LGTM: Type definitions are clear and well-structured

The type definitions provide good type safety and are used consistently throughout the tasks.


Line range hint 1-321: LGTM: Error handling and database operations are robust

The implementation demonstrates:

  • Proper error handling with AbortTaskRunError
  • Secure database operations with parameterized queries
  • Consistent JSON handling for stored data

Line range hint 29-156: Consider extracting system prompts to constants

The system prompts are currently embedded in the task implementations. Consider extracting them to constants for better maintainability and reusability.

+const RATING_SYSTEM_PROMPT = `You are a Senior Technical Content Evaluator...`;
+const RECOMMENDATION_SYSTEM_PROMPT = `You are a Senior Technical Content Strategist...`;

 export const getOrCreateRatingsTask = task({
   // ...
-    const systemPrompt = `You are a Senior Technical Content Evaluator...`;
+    const systemPrompt = RATING_SYSTEM_PROMPT;

Verify the OpenAI model configuration

The code uses a custom model alias "gpt-4o-mini". Ensure this is correctly configured in your OpenAI client setup.

apps/billing/src/trigger/glossary/generate-outline.ts (2)

Line range hint 3-15: LGTM! Import organization improved

The consolidation of imports from db-marketing/schemas and the addition of type imports improves code organization and type safety.


Line range hint 197-211: LGTM! Well-structured schema definitions

The schema definitions are clear, properly typed with zod, and maintain good separation between initial and final outline structures.

apps/dashboard/app/(app)/settings/billing/page.tsx (1)

192-196: LGTM! Verify error handling for the new ClickHouse query.

The migration from ratelimits to clickhouse.billing.billableRatelimits looks good and maintains the same interface. The results are properly integrated into the billing calculations.

Let's verify the error handling in the new ClickHouse implementation:

apps/dashboard/lib/trpc/routers/key/createRootKey.ts (1)

1-1: Verify the correctness of the updated import path

The import path for UnkeyAuditLog has been changed from "@/lib/tinybird" to "@/lib/audit". Please ensure that UnkeyAuditLog is correctly exported from the new module and that all references are updated accordingly.

Run the following script to confirm that UnkeyAuditLog is exported from "@/lib/audit":

✅ Verification successful

Import path for UnkeyAuditLog is correct and properly exported

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Description: Verify that UnkeyAuditLog is exported from "@/lib/audit"

# Find the 'audit.ts' file in the 'lib' directory and search for the export of UnkeyAuditLog
fd 'audit.ts' lib --exec rg 'export.*UnkeyAuditLog' {} \;

Length of output: 151


Script:

#!/bin/bash
# Description: Locate 'audit.ts' in the repository and verify that UnkeyAuditLog is exported

# Search for 'audit.ts' anywhere in the repository
audit_files=$(fd 'audit.ts' .)

if [ -z "$audit_files" ]; then
  echo "No audit.ts file found in the repository."
  exit 1
fi

# Check each found 'audit.ts' for the export of UnkeyAuditLog
for file in $audit_files; do
  rg 'export\s+.*UnkeyAuditLog' "$file"
done

Length of output: 412

apps/dashboard/lib/trpc/routers/rbac.ts (1)

3-3: Import statement updated appropriately

The addition of UnkeyAuditLog to the import ensures the necessary type is available for audit logging within the file.

Comment on lines +20 to +30
resources: z.string().transform((rs) =>
z
.array(
z.object({
type: z.string(),
id: z.string(),
meta: z.record(z.union([z.string(), z.number(), z.boolean(), z.null()])).optional(),
}),
)
.parse(JSON.parse(rs)),
),
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Add input validation before JSON.parse

The direct use of JSON.parse without try-catch could throw on invalid JSON input. Consider adding validation or error handling.

    resources: z.string().transform((rs) =>
-     z
-       .array(
+     z.preprocess((val) => {
+       try {
+         return JSON.parse(rs);
+       } catch {
+         return [];
+       }
+     }, z.array(

Committable suggestion skipped: line range outside the PR's diff.

@@ -1,13 +1,13 @@
import { db } from "@/lib/db-marketing/client";
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

💡 Codebase verification

Update the import to use the ClickHouse client.
The file apps/billing/src/trigger/glossary/_generate-glossary-entry.ts is still importing the old db-marketing client. Please migrate this to use the ClickHouse client to align with the PR objectives.

🔗 Analysis chain

Verify database client migration to ClickHouse.

Based on the PR objectives to "move more queries from tb to ch", should this import still be using the marketing database client or should it be migrated to use the ClickHouse client?

🏁 Scripts executed

The following scripts were executed for the analysis:

Script:

#!/bin/bash
# Search for any new ClickHouse database client implementations
rg -g '!node_modules' -g '!dist' -A 5 'clickhouse|ch-client'

# Search for similar database operations that might have been migrated
ast-grep --pattern 'db.query.entries.findFirst'

Length of output: 61314

Comment on lines +163 to +165
const prompt = `Summarize the following content for the term "${
connectTo.term
}" that's ranking #${serperResult?.position ?? "unknown"}:
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

⚠️ Potential issue

Sanitize user input in prompt construction

The template literal directly interpolates user input (connectTo.term) into the prompt, which could potentially lead to prompt injection vulnerabilities.

Consider sanitizing the input:

-  const prompt = `Summarize the following content for the term "${
-    connectTo.term
-  }" that's ranking #${serperResult?.position ?? "unknown"}:
+  const prompt = `Summarize the following content for the term "${
+    sanitizeInput(connectTo.term)
+  }" that's ranking #${serperResult?.position ?? "unknown"}:

Committable suggestion skipped: line range outside the PR's diff.

const successOverTime: { x: string; y: number }[] = [];
const ratelimitedOverTime: { x: string; y: number }[] = [];
const usageExceededOverTime: { x: string; y: number }[] = [];
const disabledOverTime: { x: string; y: number }[] = [];
const insufficientPermissionsOverTime: { x: string; y: number }[] = [];
const expiredOverTime: { x: string; y: number }[] = [];
const forbiddenOverTime: { x: string; y: number }[] = [];
const emptyOverTime: { x: string; y: number }[] = [];
const _emptyOverTime: { x: string; y: number }[] = [];
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🛠️ Refactor suggestion

Remove unused variable _emptyOverTime

The variable _emptyOverTime is declared but never used throughout the component. While prefixing with underscore indicates intent to mark it as unused, it would be cleaner to remove it entirely.

-  const _emptyOverTime: { x: string; y: number }[] = [];
📝 Committable suggestion

‼️ IMPORTANT
Carefully review the code before committing. Ensure that it accurately replaces the highlighted code, contains no missing lines, and has no issues with indentation. Thoroughly test & benchmark the code to ensure it meets the requirements.

Suggested change
const _emptyOverTime: { x: string; y: number }[] = [];

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

1 participant