Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

chore: update contributing guidelines, improve environment setup, and refactor search functionality #964

Open
wants to merge 8 commits into
base: main
Choose a base branch
from
Open
Show file tree
Hide file tree
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension


Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
20 changes: 9 additions & 11 deletions CONTRIBUTING.md
Original file line number Diff line number Diff line change
Expand Up @@ -12,13 +12,13 @@ First, start by installing dependencies:
2. pnpm [instructions](https://pnpm.io/installation)
3. redis [instructions](https://redis.io/docs/latest/operate/oss_and_stack/install/install-redis/)

Set environment variables in a .env in the /apps/api/ directory you can copy over the template in .env.example.
Set environment variables in a .env in the `apps/api/` directory you can copy over the template in .env.example.

To start, we wont set up authentication, or any optional sub services (pdf parsing, JS blocking support, AI features )

.env:

```
```bash
# ===== Required ENVS ======
NUM_WORKERS_PER_QUEUE=8
PORT=3002
Expand Down Expand Up @@ -46,16 +46,14 @@ LLAMAPARSE_API_KEY= #Set if you have a llamaparse key you'd like to use to parse
SLACK_WEBHOOK_URL= # set if you'd like to send slack server health status messages
POSTHOG_API_KEY= # set if you'd like to send posthog events like job logs
POSTHOG_HOST= # set if you'd like to send posthog events like job logs


```

### Installing dependencies

First, install the dependencies using pnpm.

```bash
# cd apps/api # to make sure you're in the right folder
# cd apps/api/ # to make sure you're in the right folder
pnpm install # make sure you have pnpm version 9+!
```

Expand All @@ -73,7 +71,7 @@ redis-server

### Terminal 2 - setting up workers

Now, navigate to the apps/api/ directory and run:
Now, navigate to the `apps/api/` directory and run:

```bash
pnpm run workers
Expand All @@ -84,7 +82,7 @@ This will start the workers who are responsible for processing crawl jobs.

### Terminal 3 - setting up the main server

To do this, navigate to the apps/api/ directory and run if you don’t have this already, install pnpm here: https://pnpm.io/installation
To do this, navigate to the `apps/api/` directory and run if you don’t have this already, install pnpm here: https://pnpm.io/installation
Next, run your server with:

```bash
Expand All @@ -95,15 +93,15 @@ pnpm run start

Alright: now let’s send our first request.

```curl
```bash
curl -X GET http://localhost:3002/test
```

This should return the response Hello, world!

If you’d like to test the crawl endpoint, you can run this

```curl
```bash
curl -X POST http://localhost:3002/v1/crawl \
-H 'Content-Type: application/json' \
-d '{
Expand All @@ -113,6 +111,6 @@ curl -X POST http://localhost:3002/v1/crawl \

## Tests:

The best way to do this is run the test with `npm run test:local-no-auth` if you'd like to run the tests without authentication.
The best way to do this is run the test with `pnpm run test:local-no-auth` if you'd like to run the tests without authentication.

If you'd like to run the tests with authentication, run `npm run test:prod`
If you'd like to run the tests with authentication, run `pnpm run test:prod`
2 changes: 1 addition & 1 deletion apps/api/package.json
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No need for this anymore, added prettier as dependency

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would say we can keep it here, as it will be removed once we merge #969.

Unless you insist that this should be removed in this PR, I can easily do it anyway.

Original file line number Diff line number Diff line change
Expand Up @@ -133,4 +133,4 @@
"temp"
]
}
}
}
2 changes: 1 addition & 1 deletion apps/api/src/control.ts
Original file line number Diff line number Diff line change
@@ -1,2 +1,2 @@
// ! IN CASE OPENAI goes down, then activate the fallback -> true
export const is_fallback = false;
export const IS_FALLBACK = false;
23 changes: 6 additions & 17 deletions apps/api/src/controllers/v0/search.ts
Original file line number Diff line number Diff line change
Expand Up @@ -6,22 +6,17 @@ import {
import { authenticateUser } from "../auth";
import { PlanType, RateLimiterMode } from "../../types";
import { logJob } from "../../services/logging/log_job";
import { PageOptions, SearchOptions } from "../../lib/entities";
import { PageOptions } from "../../lib/entities";
import { SearchOptions } from "../../search/types";
import { search } from "../../search";
import { isUrlBlocked } from "../../scraper/WebScraper/utils/blocklist";
import { v4 as uuidv4 } from "uuid";
import { logger } from "../../lib/logger";
import { getScrapeQueue } from "../../services/queue-service";
import { addScrapeJob, waitForJob } from "../../services/queue-jobs";
import * as Sentry from "@sentry/node";
import { getJobPriority } from "../../lib/job-priority";
import { Job } from "bullmq";
import {
Document,
fromLegacyCombo,
fromLegacyScrapeOptions,
toLegacyDocument,
} from "../v1/types";
import { getJobPriority, BASE_TEAM_ID } from "../../lib/job-priority";
import { Document, fromLegacyCombo, toLegacyDocument } from "../v1/types";

export async function searchHelper(
jobId: string,
Expand All @@ -39,24 +34,18 @@ export async function searchHelper(
returnCode: number;
}> {
const query = req.body.query;
const advanced = false;
if (!query) {
return { success: false, error: "Query is required", returnCode: 400 };
}

const tbs = searchOptions.tbs ?? undefined;
const filter = searchOptions.filter ?? undefined;
let num_results = Math.min(searchOptions.limit ?? 7, 10);

if (team_id === "d97c4ceb-290b-4957-8432-2b2a02727d95") {
num_results = 1;
}

const num_results =
team_id === BASE_TEAM_ID ? 1 : Math.min(req.body.limit ?? 7, 10);
const num_results_buffer = Math.floor(num_results * 1.5);

let res = await search({
query: query,
advanced: advanced,
num_results: num_results_buffer,
tbs: tbs,
filter: filter,
Expand Down
2 changes: 1 addition & 1 deletion apps/api/src/lib/entities.ts
Original file line number Diff line number Diff line change
Expand Up @@ -170,7 +170,7 @@ export class Document {
}
}

export class SearchResult {
export class SearchResult implements SearchResult {
url: string;
title: string;
description: string;
Expand Down
4 changes: 3 additions & 1 deletion apps/api/src/lib/job-priority.ts
Original file line number Diff line number Diff line change
Expand Up @@ -28,6 +28,8 @@ export async function deleteJobPriority(team_id, job_id) {
}
}

export const BASE_TEAM_ID = "d97c4ceb-290b-4957-8432-2b2a02727d95";

export async function getJobPriority({
plan,
team_id,
Expand All @@ -37,7 +39,7 @@ export async function getJobPriority({
team_id: string;
basePriority?: number;
}): Promise<number> {
if (team_id === "d97c4ceb-290b-4957-8432-2b2a02727d95") {
if (team_id === BASE_TEAM_ID) {
return 50;
}

Expand Down
2 changes: 0 additions & 2 deletions apps/api/src/lib/withAuth.ts
Original file line number Diff line number Diff line change
@@ -1,6 +1,4 @@
import { AuthResponse } from "../../src/types";
import { logger } from "./logger";
import * as Sentry from "@sentry/node";
import { configDotenv } from "dotenv";
configDotenv();

Expand Down
150 changes: 0 additions & 150 deletions apps/api/src/search/googlesearch.ts

This file was deleted.

Loading