Skip to content

Commit

Permalink
add new doc
Browse files Browse the repository at this point in the history
  • Loading branch information
VinciGit00 committed Oct 21, 2024
1 parent ec9ef2b commit 11ae717
Show file tree
Hide file tree
Showing 3 changed files with 71 additions and 1 deletion.
41 changes: 40 additions & 1 deletion docs/source/introduction/overview.rst
Original file line number Diff line number Diff line change
Expand Up @@ -22,6 +22,45 @@ This flexibility ensures that scrapers remain functional even when website layou
We support many LLMs including **GPT, Gemini, Groq, Azure, Hugging Face** etc.
as well as local models which can run on your machine using **Ollama**.

AI Models and Token Limits
==========================

ScrapGraphAI supports a wide range of AI models from various providers. Each model has a specific token limit, which is important to consider when designing your scraping pipelines. Here's an overview of the supported models and their token limits:

OpenAI Models
-------------
- GPT-3.5 Turbo (16,385 tokens)
- GPT-4 (8,192 tokens)
- GPT-4 Turbo Preview (128,000 tokens)

Azure OpenAI Models
-------------------
- GPT-3.5 Turbo (16,385 tokens)
- GPT-4 (8,192 tokens)
- GPT-4 Turbo Preview (128,000 tokens)

Google AI Models
----------------
- Gemini Pro (128,000 tokens)
- Gemini 1.5 Pro (128,000 tokens)

Anthropic Models
----------------
- Claude Instant (100,000 tokens)
- Claude 2 (200,000 tokens)
- Claude 3 (200,000 tokens)

Mistral AI Models
-----------------
- Mistral Large (128,000 tokens)
- Open Mistral 7B (32,000 tokens)
- Open Mixtral 8x7B (32,000 tokens)

For a complete list of supported models and their token limits, please refer to the API documentation.

Understanding token limits is crucial for optimizing your scraping tasks. Larger token limits allow for processing more text in a single API call, which can be beneficial for scraping lengthy web pages or documents.


Library Diagram
===============

Expand Down Expand Up @@ -95,4 +134,4 @@ Sponsors
.. image:: ../../assets/transparent_stat.png
:width: 15%
:alt: Stat Proxies
:target: https://dashboard.statproxies.com/?refferal=scrapegraph
:target: https://dashboard.statproxies.com/?refferal=scrapegraph
3 changes: 3 additions & 0 deletions docs/source/modules/modules.rst
Original file line number Diff line number Diff line change
Expand Up @@ -5,3 +5,6 @@ scrapegraphai
:maxdepth: 4

scrapegraphai

scrapegraphai.helpers.models_tokens

28 changes: 28 additions & 0 deletions docs/source/modules/scrapegraphai.helpers.models_tokens.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,28 @@
scrapegraphai.helpers.models_tokens module
==========================================

.. automodule:: scrapegraphai.helpers.models_tokens
:members:
:undoc-members:
:show-inheritance:

This module contains a comprehensive dictionary of AI models and their corresponding token limits. The `models_tokens` dictionary is organized by provider (e.g., OpenAI, Azure OpenAI, Google AI, etc.) and includes various models with their maximum token counts.

Example usage:

.. code-block:: python
from scrapegraphai.helpers.models_tokens import models_tokens
# Get the token limit for GPT-4
gpt4_limit = models_tokens['openai']['gpt-4']
print(f"GPT-4 token limit: {gpt4_limit}")
# Check the token limit for a specific model
model_name = "gpt-3.5-turbo"
if model_name in models_tokens['openai']:
print(f"{model_name} token limit: {models_tokens['openai'][model_name]}")
else:
print(f"{model_name} not found in the models list")
This information is crucial for users to understand the capabilities and limitations of different AI models when designing their scraping pipelines.

0 comments on commit 11ae717

Please sign in to comment.