diff --git a/tamingllms/_build/.doctrees/environment.pickle b/tamingllms/_build/.doctrees/environment.pickle index 8992530..d97236f 100644 Binary files a/tamingllms/_build/.doctrees/environment.pickle and b/tamingllms/_build/.doctrees/environment.pickle differ diff --git a/tamingllms/_build/.doctrees/notebooks/evals.doctree b/tamingllms/_build/.doctrees/notebooks/evals.doctree index 3816a64..7b8f783 100644 Binary files a/tamingllms/_build/.doctrees/notebooks/evals.doctree and b/tamingllms/_build/.doctrees/notebooks/evals.doctree differ diff --git a/tamingllms/_build/.doctrees/notebooks/output_size_limit.doctree b/tamingllms/_build/.doctrees/notebooks/output_size_limit.doctree index 0842ae2..5c22cf7 100644 Binary files a/tamingllms/_build/.doctrees/notebooks/output_size_limit.doctree and b/tamingllms/_build/.doctrees/notebooks/output_size_limit.doctree differ diff --git a/tamingllms/_build/.doctrees/notebooks/structured_output.doctree b/tamingllms/_build/.doctrees/notebooks/structured_output.doctree index 350da6e..a345ea0 100644 Binary files a/tamingllms/_build/.doctrees/notebooks/structured_output.doctree and b/tamingllms/_build/.doctrees/notebooks/structured_output.doctree differ diff --git a/tamingllms/_build/html/_images/langsmith.png b/tamingllms/_build/html/_images/langsmith.png index 7256206..9165eb7 100644 Binary files a/tamingllms/_build/html/_images/langsmith.png and b/tamingllms/_build/html/_images/langsmith.png differ diff --git a/tamingllms/_build/html/_images/outlines_state_machine.png b/tamingllms/_build/html/_images/outlines_state_machine.png new file mode 100644 index 0000000..a2f1dc1 Binary files /dev/null and b/tamingllms/_build/html/_images/outlines_state_machine.png differ diff --git a/tamingllms/_build/html/_sources/notebooks/evals.ipynb b/tamingllms/_build/html/_sources/notebooks/evals.ipynb index 92ee08c..6b5b1ca 100644 --- a/tamingllms/_build/html/_sources/notebooks/evals.ipynb +++ b/tamingllms/_build/html/_sources/notebooks/evals.ipynb @@ -1244,6 +1244,8 @@ "\n", "A major challenge with these leaderboards and benchmarks is test set contamination - when test data ends up in newer models' training sets, rendering the benchmarks ineffective. While some benchmarks try to address this through crowdsourced prompts and evaluations from humans or LLMs, these approaches introduce their own biases and struggle with difficult questions. **LiveBench** {cite}`white2024livebenchchallengingcontaminationfreellm` represents a novel solution, designed specifically to be resilient to both contamination and evaluation biases. As the first benchmark with continuously updated questions from recent sources, automated objective scoring, and diverse challenging tasks across multiple domains, LiveBench maintains its effectiveness even as models improve. Drawing from recent math competitions, research papers, news, and datasets, it creates contamination-free versions of established benchmark tasks. Current results show even top models achieving below 70% accuracy, demonstrating LiveBench's ability to meaningfully differentiate model capabilities. With monthly updates and an open collaborative approach, LiveBench aims to provide sustained value for model evaluation as the field advances.\n", "\n", + "Another notable benchmark is ZebraLogic {cite}`zebralogic2024`, which evaluates logical reasoning capabilities of LLMs through Logic Grid Puzzles - a type of Constraint Satisfaction Problem {cite}`brailsford1999constraint` commonly found in tests like the LSAT. These puzzles require assigning unique values to N houses across M different features based on given clues, demanding strategic reasoning and deduction to arrive at a unique correct solution. The benchmark's programmatically generated puzzles range from 2x2 to 6x6 in size and test LLMs using one-shot examples with reasoning steps. While humans can solve these puzzles through strategic methods like reductio ad absurdum and elimination, LLMs demonstrate significant limitations in this type of logical reasoning. Even the best-performing model, Claude 3.5 Sonnet, only achieves 33.4% accuracy across all puzzles and 12.4% on hard puzzles, with smaller models (7-10B parameters) solving less than 1% of hard puzzles as of December 2024. These results reveal critical gaps in LLMs' capabilities around counterfactual thinking, reflective reasoning, structured memorization, and compositional generalization.\n", + "\n", "A significant shift in AI evaluation came with the launch of the **The Alignment Research Center (ARC) Prize** {cite}`arcprize2024` by ARC Prize Inc., a non-profit for the public advancement of open artificial general intelligence. Hosted by Mike Knoop (Co-founder, Zapier) and François Chollet (Creator of ARC-AGI, Keras), this prize represents a paradigm shift in how we evaluate language models. Rather than focusing on narrow performance metrics, the ARC Prize assesses what it calls \"cognitive sufficiency\" - a model's ability to generate meaningful insights and tackle open-ended challenges. This new way to think about LLM evaluation emphasizes creative thinking, sophisticated reasoning, and the capacity to make genuinely useful contributions to human knowledge as we seek to define and measure what it means to achieve AGI (Artificial General Intelligence).\n", "\n", "\n", diff --git a/tamingllms/_build/html/_sources/notebooks/structured_output.ipynb b/tamingllms/_build/html/_sources/notebooks/structured_output.ipynb index 7615645..f82f023 100644 --- a/tamingllms/_build/html/_sources/notebooks/structured_output.ipynb +++ b/tamingllms/_build/html/_sources/notebooks/structured_output.ipynb @@ -637,18 +637,103 @@ "source": [ "### Outlines\n", "\n", - "Outlines {cite}`outlines2024` is a library specifically focused on structured text generation from LLMs. Under the hood, Outlines works by adjusting the probability distribution of the model's output logits - the raw scores from the final layer of the neural network that are normally converted into text tokens. By introducing carefully crafted logit biases, Outlines can guide the model to prefer certain tokens over others, effectively constraining its outputs to a predefined set of valid options. This provides fine-grained control over the model's generation process. In that way, Outlines provides several powerful features:\n", + "Outlines {cite}`outlines2024` is a library specifically focused on structured text generation from LLMs. Under the hood, Outlines works by adjusting the probability distribution of the model's output logits - the raw scores from the final layer of the neural network that are normally converted into text tokens. By introducing carefully crafted logit biases, Outlines can guide the model to prefer certain tokens over others, effectively constraining its outputs to a predefined set of valid options. \n", "\n", - "* **Multiple Choice Generation**: Restrict the LLM output to a predefined set of options.\n", - "* **Regex-based structured generation**: Guide the generation process using regular expressions.\n", - "* **Pydantic model**: Ensure the LLM output follows a Pydantic model.\n", - "* **JSON Schema**: Ensure the LLM output follows a JSON Schema." + "The authors solve the general guided generation problem {cite}`willard2023efficientguidedgenerationlarge`, which as a consequence solves the problem of structured output generation, in LLMs by introducing an efficient indexing approach that reformulates neural text generation using finite-state machines (FSMs).\n", + "\n", + "They define the next token generation as a random variable:\n", + "\n", + "$$s_{t+1} \\sim \\text{Categorical}(\\alpha) \\text{ where } \\alpha = \\text{LLM}(S_t, \\theta)$$\n", + "\n", + "Where:\n", + "\n", + "- $s_{t+1}$ is the next token to be generated\n", + "- $S_t = (s_1...s_t)$ represents a sequence of t tokens with $s_t \\in V$\n", + "- $V$ is the vocabulary with size $|V| = N$ (typically around $10^4$ or larger)\n", + "- $\\alpha \\in \\mathbb{R}^N$ is the output logits/probabilities over the vocabulary\n", + "- $\\theta$ is the set of trained parameters of the LLM\n", + "- $\\text{LLM}$ refers to a deep neural network trained on next-token-completion tasks\n", + "- $\\text{Categorical}(\\alpha)$ represents sampling from a categorical distribution with probabilities $\\alpha$\n", + "\n", + "When applying masking for guided generation, this becomes:\n", + "\n", + "$$\n", + "\\tilde{\\alpha} = m(S_t) \\odot \\alpha\n", + "$$\n", + "\n", + "$$\n", + "\\tilde{s}_{t+1} \\sim \\text{Categorical}(\\tilde{\\alpha})\n", + "$$\n", + "\n", + "Where:\n", + "\n", + "- $m: P(V) \\rightarrow {0,1}^N$ is a boolean mask function\n", + "- $\\odot$ represents element-wise multiplication\n", + "- $\\tilde{\\alpha}$ is the masked (constrained) probability distribution\n", + "- $\\tilde{s}_{t+1}$ is the next token sampled under constraints\n", + "\n", + "This formulation allows the masking operation to guide the generation process by zeroing out probabilities of invalid tokens according to the finite state machine states. But instead of checking the entire vocabulary (size N) at each generation step (O(N) complexity) to enforce output constraints, they convert constraints (regex/grammar) into FSM states and build an index mapping FSM states to valid vocabulary tokens. This achieves O(1) average complexity for token generation.\n", + "\n", + "In summary, there are two stages in the Outlines framework {cite}`vivien2024regex`:\n", + "\n", + "1. **Preprocessing Step**: Outlines converts a character-level deterministic finite automaton (DFA) testing whether a string matches a regex into a token-level DFA testing whether a token sequence is decoded in a string matching the regex.\n", + "\n", + "2. **Decoding Step**: At decoding time, the DFA is used to determine, for each new token, which potential tokens are allowed. Starting from the initial state of the DFA, the allowed tokens are determined by the outgoing transitions from the current state. The corresponding mask is applied to the next token probabilities and these probabilities are renormalized. A new token can then be sampled and the state of the DFA updated.\n", + "\n", + "At each step, the model's probability distribution is masked and renormalized according to the current state and valid transitions." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As an example, let's suppose we want to constrain the output of an LLM to the following set of options: \n", + "- Y/yes\n", + "- N/no\n", + "- N/never\n", + "- A/always\n", + "\n", + "\n", + "This can be done by creating a state machine that has a start state, an end state and a set of valid transitions between states with possible states represented as the following regex string: `r\"\\s*([Yy]es|[Nn]o|[Nn]ever|[Aa]lways)\"`.\n", + "\n", + "The state machine below illustrates how Outlines works under the hood {numref}`outlines_state_machine`, where:\n", + "- Prop: Represents the logit token probability given by the LLM\n", + "- Mask: Mask value of the transition as defined by the state machine\n", + "- Final: The renormalized token probability post-masking\n", + "\n", + "```{figure} ../_static/structured_output/outlines_state_machine.png\n", + "---\n", + "name: outlines_state_machine\n", + "alt: Outlines State Machine\n", + "scale: 50%\n", + "align: center\n", + "---\n", + "Outlines State Machine.\n", + "```\n", + "\n", + "The initial \"Start\" state contains a masking table that controls which tokens can begin the sequence. In this example, only characters from the set `[YyNnAa]` are allowed as valid first characters, with each having an assigned probability and mask value. The masking mechanism effectively filters out invalid tokens by setting their mask values to 0, ensuring only permitted transitions to the \"First\" state.\n", + "\n", + "After transitioning to the \"First\" state, the system continues to use probability masking to guide the sequence. For example, when receiving 'Y' as input, the masking table adjusts token probabilities to ensure valid continuations.\n", + "\n", + "This finite state machine architecture serves multiple purposes in controlling text generation:\n", + "\n", + "1. Managing token probabilities through strategic masking\n", + "2. Preventing invalid token sequences \n", + "3. Enforcing specific token patterns\n", + "4. Providing fine-grained control over token generation and validation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ + "This provides fine-grained control over the model's generation process. In that way, Outlines, the Python package, provides several powerful controlled generation features:\n", + "\n", + "* **Regex-based structured generation**: Guide the generation process using regular expressions.\n", + "* **Multiple Choice Generation**: Restrict the LLM output to a predefined set of options.\n", + "* **Pydantic model**: Ensure the LLM output follows a Pydantic model.\n", + "* **JSON Schema**: Ensure the LLM output follows a JSON Schema.\n", + "\n", "Outlines can support major proprietary LLM APIs (e.g. OpenAI's via vLLM). However, one of its key advantages is the ability to ensure structured output for Open Source models, which often lack such guarantees by default." ] }, @@ -666,7 +751,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In this example, we will use a Qwen2.5-0.5B model, a lightweight open source model from Alibaba Cloud known for its strong performance despite its small size. The model excels at instruction following and structured generation tasks while being efficient enough to run locally via Hugging Face's `transformers` library." + "In this example, we will use a `Qwen2.5-0.5B` model, a lightweight open source model from Alibaba Cloud known for its strong performance despite its small size." ] }, { @@ -772,7 +857,9 @@ "source": [ "### Ollama\n", "\n", - "Ollama is a popular tool that allows you to run large language models (LLMs) locally. It has recently added support for structured output generation. The current `ollama` implementation leverages llama.cpp GBNF (GGML BNF) grammars {cite}`llama_cpp_grammars` to enable structured output generation. llama.cpp GBNF forces language models to generate output in specific, predefined formats by constraining their outputs to follow precise rules and patterns. The system accomplishes this through a formal grammar specification that defines exactly how valid outputs can be constructed. It's essentially an extension of BNF (Backus-Naur Form) {cite}`backus_naur_form` with some modern regex-like features added. These rules carefully define what elements are allowed, how they can be combined, and what patterns of repetition and sequencing are valid. By enforcing these constraints during generation, GBNF ensures the model's output strictly adheres to the desired format.\n", + "Ollama is a popular tool that allows you to run large language models (LLMs) locally. It has recently added support for structured output generation. The current `ollama` implementation leverages llama.cpp GBNF (GGML BNF) grammars {cite}`llama_cpp_grammars` to enable structured output generation. \n", + "\n", + "llama.cpp GBNF forces language models to generate output in specific, predefined formats by constraining their outputs to follow precise rules and patterns. The system accomplishes this through a formal grammar specification that defines exactly how valid outputs can be constructed. It's essentially an extension of BNF (Backus-Naur Form) {cite}`backus_naur_form` with some modern regex-like features added. These rules carefully define what elements are allowed, how they can be combined, and what patterns of repetition and sequencing are valid. By enforcing these constraints during generation, GBNF ensures the model's output strictly adheres to the desired format.\n", "\n", "Ollama first introduced structured output generation in version 0.5.1 providing support for JSON output but highlighting additional formats are coming soon.\n" ] @@ -1017,7 +1104,7 @@ "\n", "## Acknowledgements\n", "\n", - "We would like to thank Cameron Pfiffer from the .txt team for his insightful review and feedback.\n" + "We would like to thank [Cameron Pfiffer](https://x.com/cameron_pfiffer) from the .txt team for his insightful review and feedback.\n" ] }, { diff --git a/tamingllms/_build/html/_static/structured_output/outlines_state_machine.mermaid b/tamingllms/_build/html/_static/structured_output/outlines_state_machine.mermaid new file mode 100644 index 0000000..c170783 --- /dev/null +++ b/tamingllms/_build/html/_static/structured_output/outlines_state_machine.mermaid @@ -0,0 +1,43 @@ +stateDiagram-v2 + %% Main FSM structure + [*] --> Start + Start --> First: [YyNnAa] + First --> Yes: e/o + First --> No: e/o + First --> Never: e + First --> Always: l + Yes --> End: s + No --> End: o + Never --> End: r + Always --> End: s + End --> [*] + + %% Initial State masking table + note left of Start + Initial State Masking: + Token │ Prob │ Mask │ Final + ──────────────────────────── + Y │ 0.15 │ 1 │ 0.25 + y │ 0.13 │ 1 │ 0.22 + N │ 0.14 │ 1 │ 0.23 + n │ 0.12 │ 1 │ 0.20 + A │ 0.06 │ 1 │ 0.10 + others│ 0.40 │ 0 │ 0.00 + end note + + %% First State masking example + note right of First + After 'Y' State Masking: + Token │ Prob │ Mask │ Final + ──────────────────────────── + e │ 0.30 │ 1 │ 1.00 + s │ 0.15 │ 0 │ 0.00 + a │ 0.10 │ 0 │ 0.00 + others│ 0.45 │ 0 │ 0.00 + end note + + %% Final State note + note left of End + Final State + Only accepting state + end note \ No newline at end of file diff --git a/tamingllms/_build/html/_static/structured_output/outlines_state_machine.png b/tamingllms/_build/html/_static/structured_output/outlines_state_machine.png new file mode 100644 index 0000000..a2f1dc1 Binary files /dev/null and b/tamingllms/_build/html/_static/structured_output/outlines_state_machine.png differ diff --git a/tamingllms/_build/html/notebooks/evals.html b/tamingllms/_build/html/notebooks/evals.html index 21993a7..977fe26 100644 --- a/tamingllms/_build/html/notebooks/evals.html +++ b/tamingllms/_build/html/notebooks/evals.html @@ -193,7 +193,7 @@
-

4. The Evals Gap

+

4. The Evals Gap

It doesn’t matter how beautiful your theory is,
it doesn’t matter how smart you are.
@@ -203,45 +203,45 @@

Contents

-

4.1. Non-Deterministic Generative Machines

+

4.1. Non-Deterministic Generative Machines

One of the most fundamental challenges when building products with Large Language Models (LLMs) is their generative and non-deterministic nature. Unlike traditional software systems where the same input reliably produces the same output, LLMs can generate novel text that may not exist in their training data, and produce different responses each time they’re queried - even with identical prompts and input data. This behavior is both a strength and a significant engineering challenge and product challenge.

When you ask an LLM the same question multiple times, you’ll likely get different responses. This isn’t a bug - it’s a fundamental feature of how these models work. The “temperature” parameter, which controls the randomness of outputs, allows models to be creative and generate diverse responses. However, this same feature makes it difficult to build reliable, testable systems.

Consider a financial services company using LLMs to generate investment advice. The non-deterministic nature of these models means that:

@@ -252,16 +252,16 @@

-

4.1.1. Temperature and Sampling

+

4.1.1. Temperature and Sampling

The primary source of non-determinism in LLMs comes from their sampling strategies. During text generation, the model:

  1. Calculates probability distributions for each next token

  2. Samples from these distributions based on temperature settings

  3. -
  4. Uses techniques like nucleus sampling [Holtzman et al., 2020] or top-k sampling to balance creativity and coherence

  5. +
  6. Uses techniques like nucleus sampling [Holtzman et al., 2020] or top-k sampling to balance creativity and coherence

-

4.1.2. The Temperature Spectrum

+

4.1.2. The Temperature Spectrum

  • Temperature = 0: Most deterministic, but potentially repetitive

  • Temperature = 1: Balanced creativity and coherence

  • @@ -376,25 +376,25 @@

    [Raschka, 2024].

    +

    A temperature of 1 represents the unscaled probability scores for each token in the vocabulary. Decreasing the temperature closer to 0 sharpens the distribution, so the most likely token will have an even higher probability score. Conversely, increasing the temperature makes the distribution more uniform [Raschka, 2024].

    In this simple experiment, we use an LLM to write a single-statement executive summary of an input financial filing. We observe that even a simple parameter like temperature can dramatically alter model behavior in ways that are difficult to systematically assess. At temperature 0.0, responses are consistent but potentially too rigid. At 1.0, outputs become more varied but less predictable. At 2.0, responses can be wildly different and often incoherent. This non-deterministic behavior makes traditional software testing approaches inadequate.

    The implications for evaluation are critical. How can one effectively test an LLM-powered system when the same prompt can yield radically different outputs based on a single parameter? Traditional testing relies on predictable inputs and outputs, but LLMs force us to grapple with probabilistic behavior. While lower temperatures may seem safer for critical applications, they don’t necessarily eliminate the underlying uncertainty. This highlights the need for new evaluation paradigms that can handle both deterministic and probabilistic aspects of LLM behavior.

-

4.2. Emerging Properties

+

4.2. Emerging Properties

Beyond their non-deterministic nature, LLMs present another fascinating challenge: emergent abilities that spontaneously arise as models scale up in size. These abilities - from basic question answering to complex reasoning - aren’t explicitly programmed but rather emerge “naturally” as the models grow larger and are trained on more data. This makes evaluation fundamentally different from traditional software testing, where capabilities are explicitly coded and can be tested against clear specifications.

Emerging Properties
-

Fig. 4.1 Emergent abilities of large language models and the scale [Wei et al., 2022].

+

Fig. 4.1 Emergent abilities of large language models and the scale [Wei et al., 2022].

Fig. 4.1 provides a list of emergent abilities of large language models and the scale. The relationship between model scale and emergent abilities follows a fascinating non-linear pattern. Below certain size thresholds, specific abilities may be completely absent from the model - it simply cannot perform certain tasks, no matter how much you try to coax them out. However, once the model reaches critical points in its scaling journey, these abilities can suddenly manifest in what researchers call a phase transition - a dramatic shift from inability to capability. This unpredictable emergence of capabilities stands in stark contrast to traditional software development, where features are deliberately implemented and can be systematically tested.

The implications for evaluation are pressing. While conventional software testing relies on stable test suites and well-defined acceptance criteria, LLM evaluation must contend with a constantly shifting landscape of capabilities. What worked to evaluate a 7B parameter model may be completely inadequate for a 70B parameter model that has developed new emergent abilities. This dynamic nature of LLM capabilities forces us to fundamentally rethink our approach to testing and evaluation.

-

4.3. Problem Statement

+

4.3. Problem Statement

Consider a practical example that illustrates these challenges: building a Math AI tutoring system for children powered by an LLM. In traditional software development, you would define specific features (like presenting math problems or checking answers) and write tests to verify each function. But with LLMs, you’re not just testing predefined features - you’re trying to evaluate emergent capabilities like adapting explanations to a child’s level, maintaining engagement through conversational learning, and providing age-appropriate safety-bound content.

This fundamental difference raises critical questions about evaluation:

-

4.8. Tools

+

4.8. Tools

-

4.8.1. LightEval

-

LightEval [Fourrier et al., 2023] is a lightweight framework for evaluation of LLMs across a variety of standard and bespoke metrics and tasks across multiple inference backends via Python SDK and CLI.

+

4.8.1. LightEval

+

LightEval [Fourrier et al., 2023] is a lightweight framework for evaluation of LLMs across a variety of standard and bespoke metrics and tasks across multiple inference backends via Python SDK and CLI.

As a motivating example, consider a scenario where financial data has been extracted from SEC financial filings and require econometric analysis. Tasks like estimating autoregressive models for time series forecasting or conducting hypothesis tests on market efficiency are common in financial analysis. Let’s evaluate how well different models perform on this type of task.

First, we need to select a benchmark to assess LLMs capabilities in this domain. MMLU has a sub-benchmark called Econometrics we can use for this task. Table 4.4 shows a sample of the benchmark dataset from MMLU Econometrics. It consists of multiple-choice questions from econometrics and expected answers.

@@ -1435,13 +1436,13 @@

return pipeline -

Fig. 4.8 shows a schematic representation of its key components. As inference engine, we leverage accelerate for distributed evaluation. lighteval also supports other inference backends such as vllm and tgi.

+

Fig. 4.8 shows a schematic representation of its key components. As inference engine, we leverage accelerate for distributed evaluation. lighteval also supports other inference backends such as vllm and tgi.

First, we instantiate an EvaluationTracker which manages result storage, in this example kept in a local directory output_dir, and tracks detailed evaluation metrics, optionally pushed to HuggingFace Hub.

Next, we instantiate an object of the class PipelineParameters which, in this example, configures the pipeline for parallel processing with a temporary cache in cache_dir also setting the maximum number of samples to process to max_samples. Then, in BaseModelConfig we set up the LLM model we would like to evaluate defined in pretrained.

-
+
LightEval Python SDK Sample Conceptual Overview.
-

Fig. 4.8 LightEval Python SDK Sample Conceptual Overview.

+

Fig. 4.8 LightEval Python SDK Sample Conceptual Overview.

This setup allows for systematic evaluation of language model performance on specific tasks while handling distributed computation and result tracking.

@@ -1456,7 +1457,7 @@

[Face, 2024] and metrics [Face, 2024]. The available tasks span multiple categories and benchmarks including BigBench, MMLU, TruthfulQA, WinoGrande, and HellaSwag. The framework also supports standard NLP evaluation metrics including BLEU, ROUGE, Exact Match, F1 Score, and Accuracy.

+

LightEval provides a comprehensive set of evaluation tasks [Face, 2024] and metrics [Face, 2024]. The available tasks span multiple categories and benchmarks including BigBench, MMLU, TruthfulQA, WinoGrande, and HellaSwag. The framework also supports standard NLP evaluation metrics including BLEU, ROUGE, Exact Match, F1 Score, and Accuracy.

In our case, we choose to evaluate our LLMs on the MMLU econometrics task using zero-shot learning. Hence, we define the task as follows:

-

We would like to compare the performance of multiple open source models on the MMLU econometrics task. While we could download and evaluate each model locally, we prefer instead to evaluate them on a remote server to save time and resources. LightEval enables serving the model on a TGI-compatible server/container and then running the evaluation by sending requests to the server [Face, 2024].

+

We would like to compare the performance of multiple open source models on the MMLU econometrics task. While we could download and evaluate each model locally, we prefer instead to evaluate them on a remote server to save time and resources. LightEval enables serving the model on a TGI-compatible server/container and then running the evaluation by sending requests to the server [Face, 2024].

For that purpose, we can leverage HuggingFace Serverless Inference API (or dedicated inference API) and set a configuration file for LightEval as shown below, where <MODEL-ID> is the model identifier on HuggingFace (e.g. meta-llama/Llama-3.2-1B-Instruct) and <HUGGINGFACE-TOKEN> is the user’s HuggingFace API token.

model:
   type: "tgi"
@@ -1506,17 +1507,17 @@ 

- + - + - +

Llama3.2 Instruct

LLaMA architecture-based pretrained and instruction-tuned generative models

Llama-3.2-1B-Instruct
Llama-3.2-3B-Instruct

[Meta AI, 2024]

[Meta AI, 2024]

Qwen2.5 Instruct

Instruction-tuned LLMs family built by Alibaba Cloud

Qwen2.5-0.5B-Instruct
Qwen2.5-1.5B-Instruct
Qwen2.5-3B-Instruct

[Face, 2024, Hui et al., 2024, Yang et al., 2024]

[Face, 2024, Hui et al., 2024, Yang et al., 2024]

SmolLM2 Instruct

Instruction-tuned family of compact language models built by HuggingFace

SmolLM2-360M-Instruct
SmolLM2-1.7B-Instruct

[Allal et al., 2024]

[Allal et al., 2024]

@@ -1529,10 +1530,10 @@

[Hugging Face, 2024]. Its integration with the Hugging Face ecosystem and modular architecture make it particularly powerful for evaluating open source models. For further details, visit the official repository [Fourrier et al., 2023].

+

In summary, LightEval is a simple yet flexible and comprehensive framework for evaluating LLMs across a wide variety of tasks and metrics. It can serve as a first step in selecting your next LLM for a specific task given the exponential growth in number of (open source) models available [Hugging Face, 2024]. Its integration with the Hugging Face ecosystem and modular architecture make it particularly powerful for evaluating open source models. For further details, visit the official repository [Fourrier et al., 2023].

-

4.8.2. LangSmith

+

4.8.2. LangSmith

Let’s revisit our evaluation example when we were interested in evaluating the quality of summaries generated by different (smaller and cheaper) LLM models compared to a benchmark model (larger and more expensive). Recal the setup:

  • Benchmark model: gpt-4o

  • @@ -1937,146 +1938,154 @@

    Fig. 4.11.

    -
    -LangSmith Experiment Results +

    Since we decided to upload result, we can also visualize the experiment results in LangSmith as shown in Fig. 4.11.

    +
    +LangSmith Experiment Results
    -

    Fig. 4.11 LangSmith Experiment Results

    +

    Fig. 4.11 LangSmith Experiment Results

-

4.8.3. PromptFoo

-

PromptFoo [PromptFoo, 2024] is a framework for evaluating the quality of prompts for LLMs.

+

4.8.3. PromptFoo

+

PromptFoo [PromptFoo, 2024] is a framework for evaluating the quality of prompts for LLMs.

-

4.9. References

-
-
-[ALB+24] +

4.9. References

+
+
+[ALB+24]

Loubna Ben Allal, Anton Lozhkov, Elie Bakouch, Gabriel Martín Blázquez, Lewis Tunstall, Agustín Piqueres, Andres Marafioti, Cyril Zakka, Leandro von Werra, and Thomas Wolf. Smollm2 - with great data, comes great performance. 2024.

-
+
[Are24]

Judge Arena. Judge arena: evaluating llm outputs with llms. https://judgearena.com/, 2024. Accessed: 2024.

-
+
+[BPS99] +

Sally C. Brailsford, Chris N. Potts, and Barbara M. Smith. Constraint satisfaction problems: algorithms and applications. European Journal of Operational Research, 119(3):557–581, 1999. URL: https://www.sciencedirect.com/science/article/pii/S0377221798003646, doi:https://doi.org/10.1016/S0377-2217(98)00364-6.

+
+
[CTJ+21]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. Evaluating large language models trained on code. 2021. URL: https://arxiv.org/abs/2107.03374, arXiv:2107.03374.

-
+
[CZS+24]

Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Hao Zhang, Banghua Zhu, Michael Jordan, Joseph E. Gonzalez, and Ion Stoica. Chatbot arena: an open platform for evaluating llms by human preference. 2024. URL: https://arxiv.org/abs/2403.04132, arXiv:2403.04132.

-
-[Cho24a] +
+[Cho24a]

Francois Chollet. Arc prize 2024 results. ARC Prize Website, 12/08/2024. URL: https://arcprize.org/2024-results.

-
-[Cho24b] +
+[Cho24b]

Francois Chollet. Abstraction and reasoning challenge. ARC Prize Website, 2024. URL: https://arcprize.org/.

-
+
[DGLH24]

Yann Dubois, Balázs Galambosi, Percy Liang, and Tatsunori B. Hashimoto. Length-controlled alpacaeval: a simple way to debias automatic evaluators. 2024. URL: https://arxiv.org/abs/2404.04475, arXiv:2404.04475.

-
-[Fac24a] +
+[Fac24a]

Hugging Face. Available tasks - lighteval wiki. https://github.com/huggingface/lighteval/wiki/Available-Tasks, 2024. Accessed: 2024.

-
-[Fac24b] +
+[Fac24b]

Hugging Face. Evaluate the model on a server or container - lighteval wiki. https://github.com/huggingface/lighteval/wiki/Evaluate-the-model-on-a-server-or-container, 2024. Accessed: 2024.

-
-[Fac24c] +
+[Fac24c]

Hugging Face. Gpt-2 documentation - hugging face transformers. https://huggingface.co/docs/transformers/model_doc/gpt2, 2024. Accessed: 2024.

-
+
[Fac24d]

Hugging Face. Llm as a judge. https://huggingface.co/learn/cookbook/en/llm_judge, 2024. Accessed: 2024.

-
-[Fac24e] +
+[Fac24e]

Hugging Face. Metric list - lighteval wiki. https://github.com/huggingface/lighteval/wiki/Metric-List, 2024. Accessed: 2024.

-
+
[Fac24f]

Hugging Face. Open llm leaderboard. Hugging Face Spaces, 2024. URL: https://huggingface.co/spaces/open-llm-leaderboard/blog.

-
+
[FHWT23] -(1,2) +(1,2)

Clémentine Fourrier, Nathan Habib, Thomas Wolf, and Lewis Tunstall. Lighteval: a lightweight framework for llm evaluation. 2023. URL: https://github.com/huggingface/lighteval.

-
+
[HBB+21]

Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding. 2021. URL: https://arxiv.org/abs/2009.03300, arXiv:2009.03300.

-
+
[HBD+20]

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration. 2020. URL: https://arxiv.org/abs/1904.09751, arXiv:1904.09751.

-
-[HYC+24] +
+[HYC+24]

Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Kai Dang, and others. Qwen2. 5-coder technical report. arXiv preprint arXiv:2409.12186, 2024.

-
+
[LXS+24] (1,2,3)

Zhen Li, Xiaohan Xu, Tao Shen, Can Xu, Jia-Chen Gu, Yuxuan Lai, Chongyang Tao, and Shuai Ma. Leveraging large language models for nlg evaluation: advances and challenges. 2024. URL: https://arxiv.org/abs/2401.07103, arXiv:2401.07103.

-
+
[LBL+23]

Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue Wang, Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan Kim, Neel Guha, Niladri Chatterji, Omar Khattab, Peter Henderson, Qian Huang, Ryan Chi, Sang Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, and Yuta Koreeda. Holistic evaluation of language models. 2023. URL: https://arxiv.org/abs/2211.09110, arXiv:2211.09110.

-
+
+[LBC24] +

Bill Yuchen Lin, Ronan Le Bras, and Yejin Choi. Zebralogic: benchmarking the logical reasoning ability of language models. 2024. URL: https://huggingface.co/spaces/allenai/ZebraLogic.

+
+
[LHE22]

Stephanie Lin, Jacob Hilton, and Owain Evans. Truthfulqa: measuring how models mimic human falsehoods. 2022. URL: https://arxiv.org/abs/2109.07958, arXiv:2109.07958.

-
+
[Ras24]

Sebastian Raschka. Build A Large Language Model (From Scratch). Manning, 2024. ISBN 978-1633437166. URL: https://www.manning.com/books/build-a-large-language-model-from-scratch.

-
+
[SRR+23]

Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Bryan Orinion, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Dylan Schrader, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Janelle Wingfield, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Batchelder, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Guerr, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mitch Walker, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan A. Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nicole Martinez, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramon Risco, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, Zirui Wang, and Ziyi Wu. Beyond the imitation game: quantifying and extrapolating the capabilities of language models. 2023. URL: https://arxiv.org/abs/2206.04615, arXiv:2206.04615.

-
+
[WPN+19]

Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. Superglue: a stickier benchmark for general-purpose language understanding systems. Advances in Neural Information Processing Systems, 2019.

-
+
[WSM+19]

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. Glue: a multi-task benchmark and analysis platform for natural language understanding. 2019. URL: https://arxiv.org/abs/1804.07461, arXiv:1804.07461.

-
+
[WTB+22]

Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus. Emergent abilities of large language models. 2022. URL: https://arxiv.org/abs/2206.07682, arXiv:2206.07682.

-
+
[WDR+24]

Colin White, Samuel Dooley, Manley Roberts, Arka Pal, Ben Feuer, Siddhartha Jain, Ravid Shwartz-Ziv, Neel Jain, Khalid Saifullah, Siddartha Naidu, Chinmay Hegde, Yann LeCun, Tom Goldstein, Willie Neiswanger, and Micah Goldblum. Livebench: a challenging, contamination-free llm benchmark. 2024. URL: https://arxiv.org/abs/2406.19314, arXiv:2406.19314.

-
-[YYH+24] +
+[YYH+24]

An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Keming Lu, Keqin Chen, Kexin Yang, Mei Li, Mingfeng Xue, Na Ni, Pei Zhang, Peng Wang, Ru Peng, Rui Men, Ruize Gao, Runji Lin, Shijie Wang, Shuai Bai, Sinan Tan, Tianhang Zhu, Tianhao Li, Tianyu Liu, Wenbin Ge, Xiaodong Deng, Xiaohuan Zhou, Xingzhang Ren, Xinyu Zhang, Xipin Wei, Xuancheng Ren, Yang Fan, Yang Yao, Yichang Zhang, Yu Wan, Yunfei Chu, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, and Zhihao Fan. Qwen2 technical report. arXiv preprint arXiv:2407.10671, 2024.

-
+
[ZCS+23]

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. Judging llm-as-a-judge with mt-bench and chatbot arena. 2023. URL: https://arxiv.org/abs/2306.05685, arXiv:2306.05685.

-
-[HuggingFace24] +
+[HuggingFace24]

Hugging Face. Number of models on hugging face. https://huggingface.co/spaces/huggingface/open-source-ai-year-in-review-2024?day=4, 2024. Accessed: 12/06/2024.

-
-[MetaAI24] +
+[MetaAI24]

Meta AI. Meta llama models on hugging face. https://huggingface.co/meta-llama, 2024. Accessed: 2024.

-
-[PromptFoo24] +
+[PromptFoo24]

PromptFoo. Promptfoo - open-source prompt engineering toolkit. https://www.promptfoo.dev/, 2024. Accessed: 12/06/2024.

diff --git a/tamingllms/_build/html/notebooks/output_size_limit.html b/tamingllms/_build/html/notebooks/output_size_limit.html index 2a859e3..c588fc0 100644 --- a/tamingllms/_build/html/notebooks/output_size_limit.html +++ b/tamingllms/_build/html/notebooks/output_size_limit.html @@ -194,7 +194,7 @@
-

2. Output Size Limitations

+

2. Output Size Limitations

Only those who will risk going too far can possibly find out how far one can go.

—T.S. Eliot

@@ -202,34 +202,34 @@

Contents

-

2.1. What are Token Limits?

+

2.1. What are Token Limits?

Tokens are the basic units that LLMs process text with. A token can be as short as a single character or as long as a complete word. In English, a general rule of thumb is that 1 token ≈ 4 characters or ¾ of a word.

The max_output_tokens is parameter often available in modern LLMs that determines the maximum length of text that an LLM can generate in a single response. Table 2.1 shows the max_output_tokens for several key models, which typically range between 4096 and 16384 tokens. Contrary to what one might expect, the model does not “summarizes the answer” such that it does not surpass max_output_tokens limit. Instead, it will stop once it reaches this limit, even mid-sentence, i.e. the response may be truncated.

@@ -289,7 +289,7 @@

-

2.2. Problem Statement

+

2.2. Problem Statement

The max_output_tokens limit in LLMs poses a significant challenge for users who need to generate long outputs, as it may result in truncated content and/or incomplete information.

  1. Truncated Content: Users aiming to generate extensive content, such as detailed reports or comprehensive articles, may find their outputs abruptly cut off due to the max_output_tokens limit. This truncation can result in incomplete information and disrupt the flow of the content.

  2. @@ -298,7 +298,7 @@

    -

    2.3. Content Chunking with Contextual Linking

    +

    2.3. Content Chunking with Contextual Linking

    Content chunking with contextual linking is a technique used to manage the max_output_tokens limitation by breaking down long-form content into smaller, manageable chunks. This approach allows the LLM to focus on smaller sections of the input, enabling it to generate more complete and detailed responses for each chunk while maintaining coherence and context across the entire output.

    1. Chunking the Content: The input content is split into smaller chunks. This allows the LLM to process each chunk individually, focusing on generating a complete and detailed response for that specific section of the input.

    2. @@ -309,7 +309,7 @@

      max_output_tokens limitation and generate coherent long-form content without truncation.

      Let’s examine an example implementation of this technique.

      -

      2.3.1. Generating long-form content

      +

      2.3.1. Generating long-form content

      • Goal: Generate a long-form report analyzing a company’s financial statement.

      • Input: A company’s 10K SEC filing.

      • @@ -322,7 +322,7 @@

        Fig. 2.1 illustrates the process we will follow for handling long-form content generation with Large Language Models through “Content Chunking with Contextual Linking.” It shows how input content is first split into manageable chunks using a chunking function (e.g. CharacterTextSplitter with tiktoken tokenizer), then each chunk is processed sequentially while maintaining context from previous chunks. For each chunk, the system updates the context, generates a dynamic prompt with specific parameters, makes a call to the LLM chain, and stores the response. After all chunks are processed, the individual responses are combined with newlines to create the final report, effectively working around the token limit constraints of LLMs while maintaining coherence across the generated content.

        -

        2.3.1.1. Step 1: Chunking the Content

        +

        2.3.1.1. Step 1: Chunking the Content

        There are different methods for chunking, and each of them might be appropriate for different situations. However, we can broadly group chunking strategies in two types:

        • Fixed-size Chunking: This is the most common and straightforward approach to chunking. We simply decide the number of tokens in our chunk and, optionally, whether there should be any overlap between them. In general, we will want to keep some overlap between chunks to make sure that the semantic context doesn’t get lost between chunks. Fixed-sized chunking may be a reasonable path in many common cases. Compared to other forms of chunking, fixed-sized chunking is computationally cheap and simple to use since it doesn’t require the use of any specialied techniques or libraries.

        • @@ -359,7 +359,7 @@

          -

          2.3.1.2. Step 2: Writing the Base Prompt Template

          +

          2.3.1.2. Step 2: Writing the Base Prompt Template

          We will write a base prompt template which will serve as a foundational structure for all chunks, ensuring consistency in the instructions and context provided to the language model. The template includes the following parameters:

          • role: Defines the role or persona the model should assume.

          • @@ -426,7 +426,7 @@

            -

            2.3.1.3. Step 3: Constructing Dynamic Prompt Parameters

            +

            2.3.1.3. Step 3: Constructing Dynamic Prompt Parameters

            Now, we will write a function (get_dynamic_prompt_template) that constructs prompt parameters dynamically for each chunk.

            @@ -479,7 +479,7 @@

            -

            2.3.1.4. Step 4: Generating the Report

            +

            2.3.1.4. Step 4: Generating the Report

            Finally, we will write a function that generates the actual report by calling the LLMChain with the dynamically updated prompt parameters for each chunk and concatenating the results at the end.

            @@ -538,7 +538,7 @@

            -

            2.3.1.5. Example Usage

            +

            2.3.1.5. Example Usage

            # Load the text from sample 10K SEC filing
            @@ -606,7 +606,7 @@ 

            -

            2.3.2. Discussion

            +

            2.3.2. Discussion

            Results from the generated report present a few interesting aspects:

            • Coherence: The generated report demonstrates a high level of coherence. The sections are logically structured, and the flow of information is smooth. Each part of the report builds upon the previous sections, providing a comprehensive analysis of Apple Inc.’s financial performance and key risk factors. The use of headings and subheadings helps in maintaining clarity and organization throughout the document.

            • @@ -620,7 +620,7 @@

              -

              2.4. Implications

              +

              2.4. Implications

              Implementing context chunking with contextual linking is a practical solution to manage the output size limitations of LLMs. However, this approach comes with its own set of implications that developers must consider.

              1. Increased Development Complexity: Implementing strategies to overcome the maximum output token length introduces additional layers of complexity to the application design. It necessitates meticulous management of context across multiple outputs to maintain coherence. Ensuring that each chunk retains the necessary context for the conversation or document can be challenging and often requires advanced logic to handle transitions seamlessly.

              2. @@ -630,7 +630,7 @@

                -

                2.5. Future Considerations

                +

                2.5. Future Considerations

                As models evolve, we can expect several advancements that will significantly impact how we handle output size limitations:

                1. Contextual Awareness: Future LLMs will likely have improved contextual awareness - or as Mustafa Suleyman would call “infinite memory”, enabling them to better understand and manage the context of a conversation or document over long interactions. This will reduce the need for repetitive context setting and improve the overall user experience.

                2. @@ -642,11 +642,11 @@

                  -

                  2.6. Conclusion

                  +

                  2.6. Conclusion

                  In conclusion, while managing output size limitations in LLMs presents significant challenges, it also drives innovation in application design and optimization strategies. By implementing techniques such as context chunking, efficient prompt templates, and graceful fallbacks, developers can mitigate these limitations and enhance the performance and cost-effectiveness of their applications. As the technology evolves, advancements in contextual awareness, token efficiency, and memory management will further empower developers to build more robust and scalable LLM-powered systems. It is crucial to stay informed about these developments and continuously adapt to leverage the full potential of LLMs while addressing their inherent constraints.

        -

        2.7. References

        +

        2.7. References

        [LangChain24] diff --git a/tamingllms/_build/html/notebooks/structured_output.html b/tamingllms/_build/html/notebooks/structured_output.html index da0e1e2..8c03a27 100644 --- a/tamingllms/_build/html/notebooks/structured_output.html +++ b/tamingllms/_build/html/notebooks/structured_output.html @@ -29,6 +29,8 @@ + + @@ -196,7 +198,7 @@
        -

        3. Wrestling with Structured Output

        +

        3. Wrestling with Structured Output

        In limits, there is freedom. Creativity thrives within structure.

        —Julia B. Cameron

        @@ -204,42 +206,42 @@

        Contents

        -

        3.1. Introduction

        +

        3.1. Introduction

        Large language models (LLMs) excel at generating human-like text, but they often struggle to produce output in a structured format consistently. This poses a significant challenge when we need LLMs to generate data that can be easily processed by other systems, such as databases, APIs, or other software applications. Sometimes, even with a well-crafted prompt, an LLM might produce an unstructured response when a structured one is expected. This can be particularly challenging when integrating LLMs into systems that require specific data formats.

        As a motivating example, consider the following simple task: Given a segment of a SEC financial filing, generate a two-person discussion about the key financial data from the text in JSON format, simulating what would be a real-world discussion about the underlying companies’ disclosed financial information. We would like to generate a structured output that can be easily parsed and integrated with other systems.

        Throughout this notebook, we will consider as input a segment of a sample SEC filing of Apple Inc.

        @@ -345,7 +347,7 @@

        -

        3.2. Problem Statement

        +

        3.2. Problem Statement

        Obtaining structured output from LLMs presents several significant challenges:

        • Inconsistency: LLMs often produce unpredictable results, sometimes generating well-structured output and other times deviating from the expected format.

        • @@ -354,8 +356,8 @@

          -

          3.3. User Needs

          -

          What user needs drive the demand for LLM output constraints when building LLM-based applications? In a recent work by Google Research [Liu et al., 2024], the authors explore the user need for constraints on the output of large language models, drawing on a survey of 51 industry professionals who use LLMs in their work. These needs can be broadly categorized as follows:

          +

          3.3. User Needs

          +

          What user needs drive the demand for LLM output constraints when building LLM-based applications? In a recent work by Google Research [Liu et al., 2024], the authors explore the user need for constraints on the output of large language models, drawing on a survey of 51 industry professionals who use LLMs in their work. These needs can be broadly categorized as follows:

          1. Improving Developer Efficiency and Workflow

          • Reducing Trial and Error in Prompt Engineering: Developers find the process of crafting prompts to elicit desired output formats to be time-consuming, often involving extensive testing and iteration. LLM output constraints could make this process more efficient and predictable.

          • @@ -377,10 +379,10 @@

            -

            3.4. Solutions

            +

            3.4. Solutions

            Several strategies and tools can be employed to address the challenges of structured output from LLMs.

            -

            3.4.1. Strategies

            +

            3.4.1. Strategies

            • Schema Guidance: Providing the LLM with a clear schema or blueprint of the desired output structure helps to constrain its generation and improve consistency. This can be achieved by using tools like Pydantic to define the expected data structure and then using that definition to guide the LLM’s output.

            • Output Parsing: When LLMs don’t natively support structured output, parsing their text output using techniques like regular expressions or dedicated parsing libraries can extract the desired information. For example, you can use regular expressions to extract specific patterns from the LLM’s output, or you can use libraries like Pydantic to parse the output into structured data objects.

            • @@ -388,9 +390,9 @@

              -

              3.4.2. Techniques and Tools

              +

              3.4.2. Techniques and Tools

              -

              3.4.2.1. One-Shot Prompts

              +

              3.4.2.1. One-Shot Prompts

              In one-shot prompting, you provide a single example of the desired output format within the prompt.

              @@ -457,7 +459,7 @@

              -

              3.4.2.2. Structured Output with Provider-Specific APIs

              +

              3.4.2.2. Structured Output with Provider-Specific APIs

              One-shot prompting is a simple technique that can lead to material improvements in structured output, though may not be sufficient for complex (e.g. nested) structures and / or when the model’s output needs to be restricted to a specific set of options or types.

              Provider-specific APIs can offer ways to handle those challenges. We will explore two approaches here using OpenAI’s API:

                @@ -466,7 +468,7 @@

                -

                3.4.2.3. JSON Mode

                +

                3.4.2.3. JSON Mode

                JSON mode is a feature provided by most LLM API providers, such as OpenAI, that allows the model to generate output in JSON format. This is particularly useful when you need structured data as a result, such as when parsing the output programmatically or integrating it with other systems that require JSON input. As depicted in Fig. 3.1, JSON mode is implemented by instructing theLLM model to use JSON as response format and optionally defining a target schema.

                JSON Mode @@ -604,7 +606,7 @@

                -

                3.4.3. LangChain

                +

                3.4.3. LangChain

                LangChain is a framework designed to simplify the development of LLM applications. It provider an abstraction layer over many LLM providers, including OpenAI, that offers several tools for parsing structured output.

                In particular, LangChain offers the with_structured_output method, which can be used with LLMs that support structured output APIs, allowing you to enforce a schema directly within the prompt.

                @@ -664,11 +666,78 @@

                .with_structured_output() can be found here.

              -

              3.4.4. Outlines

              -

              Outlines [Outlines, 2024] is a library specifically focused on structured text generation from LLMs. Under the hood, Outlines works by adjusting the probability distribution of the model’s output logits - the raw scores from the final layer of the neural network that are normally converted into text tokens. By introducing carefully crafted logit biases, Outlines can guide the model to prefer certain tokens over others, effectively constraining its outputs to a predefined set of valid options. This provides fine-grained control over the model’s generation process. In that way, Outlines provides several powerful features:

              +

              3.4.4. Outlines

              +

              Outlines [Outlines, 2024] is a library specifically focused on structured text generation from LLMs. Under the hood, Outlines works by adjusting the probability distribution of the model’s output logits - the raw scores from the final layer of the neural network that are normally converted into text tokens. By introducing carefully crafted logit biases, Outlines can guide the model to prefer certain tokens over others, effectively constraining its outputs to a predefined set of valid options.

              +

              The authors solve the general guided generation problem [Willard and Louf, 2023], which as a consequence solves the problem of structured output generation, in LLMs by introducing an efficient indexing approach that reformulates neural text generation using finite-state machines (FSMs).

              +

              They define the next token generation as a random variable:

              +
              +\[s_{t+1} \sim \text{Categorical}(\alpha) \text{ where } \alpha = \text{LLM}(S_t, \theta)\]
              +

              Where:

              +
                +
              • \(s_{t+1}\) is the next token to be generated

              • +
              • \(S_t = (s_1...s_t)\) represents a sequence of t tokens with \(s_t \in V\)

              • +
              • \(V\) is the vocabulary with size \(|V| = N\) (typically around \(10^4\) or larger)

              • +
              • \(\alpha \in \mathbb{R}^N\) is the output logits/probabilities over the vocabulary

              • +
              • \(\theta\) is the set of trained parameters of the LLM

              • +
              • \(\text{LLM}\) refers to a deep neural network trained on next-token-completion tasks

              • +
              • \(\text{Categorical}(\alpha)\) represents sampling from a categorical distribution with probabilities \(\alpha\)

              • +
              +

              When applying masking for guided generation, this becomes:

              +
              +\[ +\tilde{\alpha} = m(S_t) \odot \alpha +\]
              +
              +\[ +\tilde{s}_{t+1} \sim \text{Categorical}(\tilde{\alpha}) +\]
              +

              Where:

              +
                +
              • \(m: P(V) \rightarrow {0,1}^N\) is a boolean mask function

              • +
              • \(\odot\) represents element-wise multiplication

              • +
              • \(\tilde{\alpha}\) is the masked (constrained) probability distribution

              • +
              • \(\tilde{s}_{t+1}\) is the next token sampled under constraints

              • +
              +

              This formulation allows the masking operation to guide the generation process by zeroing out probabilities of invalid tokens according to the finite state machine states. But instead of checking the entire vocabulary (size N) at each generation step (O(N) complexity) to enforce output constraints, they convert constraints (regex/grammar) into FSM states and build an index mapping FSM states to valid vocabulary tokens. This achieves O(1) average complexity for token generation.

              +

              In summary, there are two stages in the Outlines framework [Tran-Thien, 2024]:

              +
                +
              1. Preprocessing Step: Outlines converts a character-level deterministic finite automaton (DFA) testing whether a string matches a regex into a token-level DFA testing whether a token sequence is decoded in a string matching the regex.

              2. +
              3. Decoding Step: At decoding time, the DFA is used to determine, for each new token, which potential tokens are allowed. Starting from the initial state of the DFA, the allowed tokens are determined by the outgoing transitions from the current state. The corresponding mask is applied to the next token probabilities and these probabilities are renormalized. A new token can then be sampled and the state of the DFA updated.

              4. +
              +

              At each step, the model’s probability distribution is masked and renormalized according to the current state and valid transitions.

              +

              As an example, let’s suppose we want to constrain the output of an LLM to the following set of options:

              +
                +
              • Y/yes

              • +
              • N/no

              • +
              • N/never

              • +
              • A/always

              • +
              +

              This can be done by creating a state machine that has a start state, an end state and a set of valid transitions between states with possible states represented as the following regex string: r"\s*([Yy]es|[Nn]o|[Nn]ever|[Aa]lways)".

              +

              The state machine below illustrates how Outlines works under the hood Fig. 3.2, where:

              +
                +
              • Prop: Represents the logit token probability given by the LLM

              • +
              • Mask: Mask value of the transition as defined by the state machine

              • +
              • Final: The renormalized token probability post-masking

              • +
              +
              +Outlines State Machine +
              +

              Fig. 3.2 Outlines State Machine.

              +
              +
              +

              The initial “Start” state contains a masking table that controls which tokens can begin the sequence. In this example, only characters from the set [YyNnAa] are allowed as valid first characters, with each having an assigned probability and mask value. The masking mechanism effectively filters out invalid tokens by setting their mask values to 0, ensuring only permitted transitions to the “First” state.

              +

              After transitioning to the “First” state, the system continues to use probability masking to guide the sequence. For example, when receiving ‘Y’ as input, the masking table adjusts token probabilities to ensure valid continuations.

              +

              This finite state machine architecture serves multiple purposes in controlling text generation:

              +
                +
              1. Managing token probabilities through strategic masking

              2. +
              3. Preventing invalid token sequences

              4. +
              5. Enforcing specific token patterns

              6. +
              7. Providing fine-grained control over token generation and validation

              8. +
              +

              This provides fine-grained control over the model’s generation process. In that way, Outlines, the Python package, provides several powerful controlled generation features:

                -
              • Multiple Choice Generation: Restrict the LLM output to a predefined set of options.

              • Regex-based structured generation: Guide the generation process using regular expressions.

              • +
              • Multiple Choice Generation: Restrict the LLM output to a predefined set of options.

              • Pydantic model: Ensure the LLM output follows a Pydantic model.

              • JSON Schema: Ensure the LLM output follows a JSON Schema.

              @@ -677,7 +746,7 @@

              install transformers

        -

        In this example, we will use a Qwen2.5-0.5B model, a lightweight open source model from Alibaba Cloud known for its strong performance despite its small size. The model excels at instruction following and structured generation tasks while being efficient enough to run locally via Hugging Face’s transformers library.

        +

        In this example, we will use a Qwen2.5-0.5B model, a lightweight open source model from Alibaba Cloud known for its strong performance despite its small size.

        import outlines
        @@ -743,8 +812,9 @@ 

        -

        3.4.5. Ollama

        -

        Ollama is a popular tool that allows you to run large language models (LLMs) locally. It has recently added support for structured output generation. The current ollama implementation leverages llama.cpp GBNF (GGML BNF) grammars [Ggerganov, 2024] to enable structured output generation. llama.cpp GBNF forces language models to generate output in specific, predefined formats by constraining their outputs to follow precise rules and patterns. The system accomplishes this through a formal grammar specification that defines exactly how valid outputs can be constructed. It’s essentially an extension of BNF (Backus-Naur Form) [Wikipedia contributors, 2024] with some modern regex-like features added. These rules carefully define what elements are allowed, how they can be combined, and what patterns of repetition and sequencing are valid. By enforcing these constraints during generation, GBNF ensures the model’s output strictly adheres to the desired format.

        +

        3.4.5. Ollama

        +

        Ollama is a popular tool that allows you to run large language models (LLMs) locally. It has recently added support for structured output generation. The current ollama implementation leverages llama.cpp GBNF (GGML BNF) grammars [Ggerganov, 2024] to enable structured output generation.

        +

        llama.cpp GBNF forces language models to generate output in specific, predefined formats by constraining their outputs to follow precise rules and patterns. The system accomplishes this through a formal grammar specification that defines exactly how valid outputs can be constructed. It’s essentially an extension of BNF (Backus-Naur Form) [Wikipedia contributors, 2024] with some modern regex-like features added. These rules carefully define what elements are allowed, how they can be combined, and what patterns of repetition and sequencing are valid. By enforcing these constraints during generation, GBNF ensures the model’s output strictly adheres to the desired format.

        Ollama first introduced structured output generation in version 0.5.1 providing support for JSON output but highlighting additional formats are coming soon.

        Let’s replicate our previous structured output generation example with Ollama. First, make sure you have Ollama installed. You can find installation instructions here.

        curl -fsSL https://ollama.com/install.sh | sh
        @@ -840,9 +910,9 @@ 

        -

        3.5. Discussion

        +

        3.5. Discussion

        -

        3.5.1. Comparing Solutions

        +

        3.5.1. Comparing Solutions

        The choice of framework for structured LLM output depends heavily on specific constraints, requirements and use cases. LangChain is the most used LLM framework today with a large developer community base however its structured output support depends on the underlying LLM provider support. Ollama enables straightforward local deployment and experimentation democratizing access to LLMs while fostering privacy and control, however today it only offers JSON format with further formats to come. Outlines emerges as a solution with great flexibility and control over output structure while providing support for a wide range of LLMs. Table 3.1 provides a summary comparison of the different frameworks.

@@ -888,7 +958,7 @@

-

3.5.2. Best Practices

+

3.5.2. Best Practices

  • Clear Schema Definition: Define the desired output structure clearly. This can be done in several ways including schemas, types, or Pydantic models as appropriate. This ensures the LLM knows exactly what format is expected.

  • Descriptive Naming: Use meaningful names for fields and elements in your schema. This makes the output more understandable and easier to work with.

  • @@ -897,23 +967,23 @@

    -

    3.5.3. Research and Ongoing Debate

    +

    3.5.3. Research and Ongoing Debate

    The use of structured output for Large Language Models (LLMs) is a developing area. While the ability to constrain LLM outputs offer clear benefits in parsing, robustness, and integration, there is growing debate on whether it also potentially comes at the cost of performance as well as reasoning abilities. Research in this area should be taken with a grain of salt since findings are mixed and often depend on the specific task and model family at hand furthermore model families are not always comparable and are getting updated by the day! Nonetheless, early findings provide some interesting insights as to why there is no one-size-fits-all solution when it comes to LLMs structured output.

    -

    There is some evidence indicating that LLMs may have bias in their handling of different output formats [Long et al., 2024]. The study examined common output structures like multiple-choice answers, wrapped text, lists, and key-value mappings. The authors analyzed key LLM model families, namely Gemma, Mistral, and ChatGPT, uncovering bias across multiple tasks and formats. The researchers attributed these biases to the models’ underlying token distributions for different formats. An example of this format bias emerged in the comparison between JSON and YAML outputs. While models like Mistral and Gemma excelled at generating JSON structures, they performed notably worse with YAML. Their YAML outputs often contained extraneous information that degrades output quality. This disparity likely stems from JSON’s prevalence in training data, highlighting how a format’s popularity directly influences model performance. While the studied models can be probably considered outdated by now since models are getting updated on a rapidly fashion, it is important to remark that addressing format bias is critical for advancing LLMs and ensuring their reliable application in real-world scenarios.

    -

    Recent research “Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models” [Tam et al., 2024] suggests that imposing format restrictions on LLMs might impact their performance, particularly in reasoning-intensive tasks. Further evidence [Aider, 2024] suggests LLMs may produce lower quality code if they’re asked to return it as part of a structured JSON response, in particular:

    +

    There is some evidence indicating that LLMs may have bias in their handling of different output formats [Long et al., 2024]. The study examined common output structures like multiple-choice answers, wrapped text, lists, and key-value mappings. The authors analyzed key LLM model families, namely Gemma, Mistral, and ChatGPT, uncovering bias across multiple tasks and formats. The researchers attributed these biases to the models’ underlying token distributions for different formats. An example of this format bias emerged in the comparison between JSON and YAML outputs. While models like Mistral and Gemma excelled at generating JSON structures, they performed notably worse with YAML. Their YAML outputs often contained extraneous information that degrades output quality. This disparity likely stems from JSON’s prevalence in training data, highlighting how a format’s popularity directly influences model performance. While the studied models can be probably considered outdated by now since models are getting updated on a rapidly fashion, it is important to remark that addressing format bias is critical for advancing LLMs and ensuring their reliable application in real-world scenarios.

    +

    Recent research “Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models” [Tam et al., 2024] suggests that imposing format restrictions on LLMs might impact their performance, particularly in reasoning-intensive tasks. Further evidence [Aider, 2024] suggests LLMs may produce lower quality code if they’re asked to return it as part of a structured JSON response, in particular:

    • Potential performance degradation: Enforcing structured output, especially through constrained decoding methods like JSON-mode, can negatively impact an LLM’s reasoning abilities. This is particularly evident in tasks that require multi-step reasoning or complex thought processes.

    • Overly restrictive schemas: Imposing strict schemas can limit the expressiveness of LLM outputs and may hinder their ability to generate creative or nuanced responses. In certain cases, the strictness of the schema might outweigh the benefits of structured output.

    • Increased complexity in prompt engineering: Crafting prompts that effectively guide LLMs to generate structured outputs while maintaining performance can be challenging. It often requires careful consideration of the schema, the task instructions, and the desired level of detail in the response.

    -

    On the other hand, those findings are not without criticism. The .txt team challenges the work of [Tam et al., 2024]. The rebuttal argues that structured generation, when done correctly, actually improves performance.

    +

    On the other hand, those findings are not without criticism. The .txt team challenges the work of [Tam et al., 2024]. The rebuttal argues that structured generation, when done correctly, actually improves performance.

    Structured vs Unstructured Results by .txt team
    -

    Fig. 3.2 Structured vs Unstructured Results by .txt team.

    +

    Fig. 3.3 Structured vs Unstructured Results by .txt team.

    -

    The .txt team presents compelling evidence through their reproduction of the paper’s experiments. While their unstructured results align with the original paper’s findings, their structured results paint a dramatically different picture - demonstrating that structured generation actually improves performance (see Fig. 3.2). The team has made their experimental notebooks publicly available on GitHub for independent verification [Dottxt, 2024].

    +

    The .txt team presents compelling evidence through their reproduction of the paper’s experiments. While their unstructured results align with the original paper’s findings, their structured results paint a dramatically different picture - demonstrating that structured generation actually improves performance (see Fig. 3.3). The team has made their experimental notebooks publicly available on GitHub for independent verification [Dottxt, 2024].

    .txt team identifies several flaws in the methodology of “Let Me Speak Freely?” that they believe led to inaccurate conclusions:

    • The paper finds that structured output improves performance on classification tasks but doesn’t reconcile this finding with its overall negative conclusion about structured output.

    • @@ -927,47 +997,55 @@

      -

      3.6. Conclusion

      +

      3.6. Conclusion

      Extracting structured output from LLMs is crucial for integrating them into real-world applications. By understanding the challenges and employing appropriate strategies and tools, developers can improve the reliability and usability of LLM-powered systems, unlocking their potential to automate complex tasks and generate valuable insights.

      -

      3.7. Acknowledgements

      -

      We would like to thank Cameron Pfiffer from the .txt team for his insightful review and feedback.

      +

      3.7. Acknowledgements

      +

      We would like to thank Cameron Pfiffer from the .txt team for his insightful review and feedback.

      -

      3.8. References

      -
      -
      -[Aid24] +

      3.8. References

      +
      +
      +[Aid24]

      Aider. Code in json: structured output for llms. https://aider.chat/2024/08/14/code-in-json.html, 2024. Accessed: 2024.

      -
      -[Dot24] +
      +[Dot24]

      Dottxt. Say what you mean: demos. https://github.com/dottxt-ai/demos/tree/main/say-what-you-mean, 2024. Accessed: 2024.

      -
      -[Gge24] +
      +[Gge24]

      Ggerganov. Llama.cpp grammars documentation. https://github.com/ggerganov/llama.cpp/blob/master/grammars/README.md, 2024. Accessed: 2024.

      -
      +
      [LLF+24]

      Michael Xieyang Liu, Frederick Liu, Alexander J. Fiannaca, Terry Koo, Lucas Dixon, Michael Terry, and Carrie J. Cai. "we need structured output": towards user-centered constraints on large language model output. In Extended Abstracts of the CHI Conference on Human Factors in Computing Systems, CHI EA '24. New York, NY, USA, 2024. Association for Computing Machinery. URL: https://doi.org/10.1145/3613905.3650756, doi:10.1145/3613905.3650756.

      -
      -[LNS+24] +
      +[LNS+24]

      Do Xuan Long, Hai Nguyen Ngoc, Tiviatis Sim, Hieu Dao, Shafiq Joty, Kenji Kawaguchi, Nancy F Chen, and Min-Yen Kan. Llms are biased towards output formats! systematically evaluating and mitigating output format bias of llms. arXiv preprint arXiv:2408.08656, 2024.

      -
      +
      [Out24]

      Outlines. Type-safe structured output from llms. https://dottxt-ai.github.io/outlines/latest/, 2024. Accessed: 2024.

      -
      +
      [TWT+24] -(1,2) +(1,2)

      Zhi Rui Tam, Cheng-Kuang Wu, Yi-Lin Tsai, Chieh-Yen Lin, Hung-yi Lee, and Yun-Nung Chen. Let me speak freely? a study on the impact of format restrictions on performance of large language models. 2024. URL: https://arxiv.org/abs/2408.02442, arXiv:2408.02442.

      -
      -[Wikipediacontributors24] +
      +[TT24] +

      Vivien Tran-Thien. Llm decoding with regex constraints. Blog post, 2024. URL: https://vivien000.github.io/blog/journal/llm-decoding-with-regex-constraints.html.

      +
      +
      +[WL23] +

      Brandon T. Willard and Rémi Louf. Efficient guided generation for large language models. 2023. URL: https://arxiv.org/abs/2307.09702, arXiv:2307.09702.

      +
      +
      +[Wikipediacontributors24]

      Wikipedia contributors. Backus naur form. https://en.wiktionary.org/wiki/Backus-Naur_form, 2024. Accessed: 2024.

      diff --git a/tamingllms/_build/html/objects.inv b/tamingllms/_build/html/objects.inv index 84acb21..2d912e9 100644 Binary files a/tamingllms/_build/html/objects.inv and b/tamingllms/_build/html/objects.inv differ diff --git a/tamingllms/_build/html/searchindex.js b/tamingllms/_build/html/searchindex.js index 841409b..7ddb3ed 100644 --- a/tamingllms/_build/html/searchindex.js +++ b/tamingllms/_build/html/searchindex.js @@ -1 +1 @@ -Search.setIndex({"docnames": ["markdown/intro", "markdown/toc", "notebooks/evals", "notebooks/output_size_limit", "notebooks/structured_output"], "filenames": ["markdown/intro.md", "markdown/toc.md", "notebooks/evals.ipynb", "notebooks/output_size_limit.ipynb", "notebooks/structured_output.ipynb"], "titles": ["1. Introduction", "Taming LLMs", "4. The Evals Gap", "2. Output Size Limitations", "3. Wrestling with Structured Output"], "terms": {"am": 0, "alwai": [0, 2, 4], "do": [0, 2, 3, 4], "which": [0, 2, 3, 4], "cannot": [0, 2], "order": [0, 2, 4], "mai": [0, 2, 3, 4], "learn": [0, 2], "how": [0, 2, 3, 4], "pablo": [0, 2], "picasso": 0, "In": [0, 2, 3, 4], "recent": [0, 2, 4], "year": [0, 2, 3, 4], "larg": [0, 1, 2, 3, 4], "languag": [0, 1, 2, 3, 4], "model": [0, 1, 4], "llm": [0, 3, 4], "have": [0, 2, 3, 4], "emerg": [0, 1, 4], "transform": [0, 2, 4], "forc": [0, 2, 4], "technologi": [0, 2, 3, 4], "promis": [0, 2], "revolution": 0, "build": [0, 1, 2, 3, 4], "product": [0, 1, 2, 4], "interact": [0, 2, 3, 4], "comput": [0, 2, 3, 4], "from": [0, 2, 3, 4], "chatgpt": [0, 4], "github": [0, 2, 4], "copilot": 0, "claud": [0, 2, 3], "artifact": 0, "system": [0, 2, 3, 4], "captur": [0, 2], "public": [0, 2], "imagin": 0, "spark": 0, "gold": [0, 2], "rush": 0, "ai": [0, 2, 4], "power": [0, 1, 2, 3, 4], "applic": [0, 1, 3, 4], "howev": [0, 2, 3, 4], "beneath": 0, "surfac": [0, 2], "technolog": [0, 2], "revolut": 0, "li": [0, 2], "complex": [0, 2, 3, 4], "landscap": [0, 2], "practition": [0, 2], "must": [0, 2, 3], "navig": [0, 1, 2], "focus": [0, 2, 3, 4], "bring": 0, "awar": [0, 2, 3], "limit": [0, 2, 4], "har": [0, 1, 3], "open": [0, 2, 3, 4], "sourc": [0, 2, 4], "solut": [0, 1, 2, 3], "overcom": [0, 3], "them": [0, 2, 3, 4], "robust": [0, 2, 3, 4], "It": [0, 2, 3, 4], "offer": [0, 2, 3, 4], "critic": [0, 1, 2, 3, 4], "implement": [0, 1, 2, 3, 4], "back": [0, 2, 4], "reproduc": [0, 1, 2], "exampl": [0, 1, 2, 4], "while": [0, 1, 2, 3, 4], "mani": [0, 2, 3, 4], "resourc": [0, 2, 3], "cover": [0, 2, 3], "capabl": [0, 1, 2, 3, 4], "specif": [0, 1, 2, 3], "hidden": 0, "pitfal": 0, "engin": [0, 1, 2, 4], "technic": [0, 1, 2, 3, 4], "manag": [0, 1, 2, 3], "face": [0, 2, 4], "when": [0, 1, 2, 3, 4], "comprehens": [0, 1, 2, 3, 4], "guid": [0, 2, 4], "leverag": [0, 2, 3, 4], "battl": [0, 1], "test": [0, 1, 4], "tool": [0, 3], "throughout": [0, 2, 3, 4], "tackl": [0, 2], "follow": [0, 2, 3, 4], "non": [0, 1, 4], "exhaust": 0, "list": [0, 2, 3, 4], "structur": [0, 2, 3], "un": 0, "reliabl": [0, 2, 4], "struggl": [0, 2, 4], "maintain": [0, 2, 3, 4], "consist": [0, 2, 3, 4], "output": [0, 2], "format": [0, 2, 3, 4], "complic": 0, "integr": [0, 2, 4], "larger": [0, 2, 3, 4], "make": [0, 2, 3, 4], "error": [0, 2, 4], "handl": [0, 1, 2, 3, 4], "more": [0, 2, 3, 4], "size": [0, 2, 4], "length": [0, 2, 4], "constraint": [0, 1, 2, 3, 4], "strict": [0, 4], "token": [0, 1, 2, 4], "both": [0, 2], "input": [0, 2, 3, 4], "requir": [0, 3, 4], "care": [0, 2, 4], "chunk": [0, 1], "strategi": [0, 1, 2, 3], "long": [0, 1, 2, 4], "form": [0, 1, 2, 4], "effect": [0, 2, 3, 4], "tradit": 0, "softwar": [0, 4], "methodologi": [0, 2, 4], "break": [0, 2, 3], "down": [0, 2, 3], "deal": 0, "determinist": [0, 1, 4], "gener": [0, 1, 4], "new": [0, 2, 3, 4], "hallucin": [0, 2, 4], "These": [0, 2, 3, 4], "can": [0, 2, 3, 4], "plausibl": 0, "sound": 0, "entir": [0, 2, 3], "fabric": [0, 2], "inform": [0, 2, 3, 4], "creat": [0, 2, 3, 4], "signific": [0, 2, 3, 4], "risk": [0, 2, 3], "safeti": [0, 2, 4], "secur": [0, 2, 3, 4], "harm": [0, 2], "bias": [0, 2, 4], "inappropri": 0, "safeguard": [0, 2], "monitor": [0, 1, 2], "ensur": [0, 2, 3, 4], "safe": [0, 2, 4], "deploy": [0, 1, 2, 4], "cost": [0, 2, 4], "optim": [0, 1, 2, 3], "The": [0, 3, 4], "financi": [0, 2, 3, 4], "oper": [0, 2, 3], "base": [0, 1, 4], "quickli": [0, 3], "becom": [0, 2], "prohibit": [0, 2], "without": [0, 2, 3, 4], "observ": [0, 2, 4], "vendor": [0, 1, 2], "lock": [0, 1], "cloud": [0, 2, 4], "provid": [0, 2, 3], "depend": [0, 2, 4], "through": [0, 1, 2, 3, 4], "proprietari": [0, 4], "infrastructur": 0, "difficult": [0, 2], "switch": 0, "self": [0, 1, 2], "host": [0, 1, 2], "take": [0, 1, 2, 3, 4], "hand": [0, 3, 4], "concret": [0, 1], "you": [0, 2, 3, 4], "run": [0, 2, 4], "modifi": [0, 2], "real": [0, 2, 3, 4], "world": [0, 2, 4], "scenario": [0, 2, 4], "best": [0, 1, 2], "techniqu": [0, 1, 2, 3], "pattern": [0, 1, 2, 4], "anti": [0, 2], "look": [0, 1, 2], "our": [0, 2, 3, 4], "goal": [0, 2, 3], "discourag": 0, "us": [0, 3, 4], "enabl": [0, 2, 3, 4], "By": [0, 1, 2, 3, 4], "understand": [0, 1, 2, 3, 4], "upfront": [0, 1], "better": [0, 1, 2, 3], "equip": [0, 1, 2], "avoid": [0, 2, 4], "current": [0, 1, 2, 3, 4], "discours": [0, 1], "around": [0, 1, 2, 3, 4], "tend": [0, 1, 2], "toward": [0, 2, 4], "extrem": [0, 2], "either": [0, 2, 3], "uncrit": 0, "enthusiasm": 0, "wholesal": [0, 2], "dismiss": 0, "differ": [0, 2, 3, 4], "focu": [0, 1, 2, 3, 4], "rather": [0, 2], "than": [0, 2], "theoret": 0, "examin": [0, 2, 3, 4], "first": [0, 2, 3, 4], "everi": [0, 2], "concept": [0, 2], "illustr": [0, 2, 3], "execut": [0, 2], "immedi": [0, 2], "analysi": [0, 1, 2, 3], "balanc": [0, 2, 3, 4], "help": [0, 2, 3, 4], "reader": [0, 1], "decis": [0, 2, 4], "intend": [0, 2], "develop": [0, 2, 3, 4], "step": [0, 1, 2, 4], "insight": [0, 2, 3, 4], "along": [0, 2], "guidanc": [0, 4], "framework": [0, 2], "could": [0, 2, 3, 4], "derail": 0, "project": [0, 2], "earli": [0, 2, 4], "befor": [0, 2, 4], "thei": [0, 2, 3, 4], "costli": [0, 2], "problem": [0, 1], "too": [0, 2, 3], "late": 0, "lifecycl": 0, "design": [0, 1, 3, 4], "lead": [0, 2, 3, 4], "genai": 0, "initi": [0, 2, 3], "leader": [0, 2], "architectur": [0, 2, 3], "advoc": 0, "anyon": 0, "seek": [0, 2], "work": [0, 1, 2, 3, 4], "typic": [0, 2, 3], "job": [0, 2], "role": [0, 2, 3, 4], "platform": [0, 2, 3, 4], "backend": [0, 2], "exist": [0, 2], "ml": 0, "transit": [0, 2, 3], "overse": 0, "motiv": [0, 2, 4], "need": [0, 2, 3], "readi": [0, 2], "desir": [0, 2, 4], "perform": [0, 1, 2, 3, 4], "after": [0, 2, 3], "read": [0, 2, 3, 4], "implic": [0, 1, 2], "experi": [0, 2, 3, 4], "recommend": [0, 2, 3, 4], "abl": [0, 2, 3, 4], "deploi": [0, 2, 3], "proper": [0, 4], "realist": 0, "effort": [0, 2, 4], "estim": [0, 2], "impact": [0, 2, 3, 4], "timelin": 0, "To": [0, 2, 3, 4], "most": [0, 2, 3, 4], "should": [0, 2, 3, 4], "basic": [0, 2, 3], "program": [0, 2], "knowledg": [0, 2], "introductori": [0, 1], "langchain": [0, 1, 2, 3], "e": [0, 2, 3, 4], "g": [0, 2, 3, 4], "chat": [0, 2, 3, 4], "prompt": [0, 1, 2], "templat": [0, 1, 2], "access": [0, 2, 3, 4], "openai": [0, 2, 4], "anthrop": [0, 4], "similar": [0, 2, 4], "grade": 0, "dive": 0, "here": [0, 2, 3, 4], "get": [0, 2, 3, 4], "start": [0, 2, 4], "activ": [0, 2], "virtual": [0, 2], "m": [0, 2], "venv": [0, 2], "env": [0, 2, 3, 4], "bin": 0, "On": [0, 2, 4], "window": [0, 1, 2], "script": 0, "instal": [0, 2, 4], "packag": [0, 2], "pip": [0, 2, 4], "r": [0, 2, 3, 4], "txt": [0, 2, 3, 4], "file": [0, 2, 3, 4], "root": 0, "directori": [0, 2], "add": [0, 3], "other": [0, 2, 3, 4], "sensit": [0, 2], "openai_api_kei": 0, "your_openai_api_key_her": 0, "never": 0, "share": [0, 2, 4], "commit": [0, 2], "version": [0, 2, 4], "control": [0, 2, 4], "contain": [0, 2, 3, 4], "kept": [0, 2], "privat": [0, 2], "clone": 0, "companion": 0, "git": 0, "http": [0, 2, 3, 4], "com": [0, 2, 3, 4], "souzatharsi": 0, "tamingllm": [0, 2], "cd": 0, "If": [0, 2, 4], "encount": [0, 1, 2], "rate": [0, 2], "consid": [0, 2, 3, 4], "smaller": [0, 2, 3, 4], "retri": [0, 4], "logic": [0, 2, 3], "conflict": [0, 2], "try": [0, 2, 4], "fresh": 0, "like": [0, 2, 3, 4], "poetri": 0, "check": [0, 2], "page": [0, 2], "known": [0, 2, 4], "now": [0, 2, 3, 4], "let": [0, 2, 3, 4], "begin": [0, 2], "explor": [0, 2, 4], "dr": 0, "tharsi": 0, "souza": 0, "scientist": 0, "special": [0, 2, 4], "he": [0, 2], "lectur": 0, "columbia": 0, "univers": [0, 2], "master": [0, 4], "scienc": [0, 2], "appli": [0, 2, 3], "analyt": 0, "head": [0, 2, 3], "equiti": [0, 2], "citadel": 0, "former": [0, 2], "senior": [0, 2], "vp": 0, "two": [0, 2, 3, 4], "sigma": 0, "invest": [0, 2, 4], "With": [0, 2], "over": [0, 1, 2, 3, 4], "15": [0, 2, 4], "deliv": [0, 2], "across": [0, 2, 4], "startup": 0, "fortun": 0, "500": [0, 2], "compani": [0, 2, 3, 4], "global": [0, 2], "also": [0, 2, 3, 4], "an": [0, 1, 2, 3, 4], "numer": [0, 2], "scholarli": 0, "frequent": [0, 2, 4], "speaker": [0, 2], "academ": [0, 2], "busi": [0, 2], "confer": [0, 4], "ground": [0, 1, 2], "background": [0, 2, 3], "draw": [0, 2, 4], "scale": [0, 2, 4], "stage": 0, "major": [0, 2, 4], "institut": [0, 2], "well": [0, 2, 4], "advis": 0, "profit": [0, 2, 3, 4], "organ": [0, 2, 3], "contribut": [0, 2, 3], "uniqu": [0, 2], "bridg": 0, "gap": 0, "between": [0, 2, 3, 4], "potenti": [0, 2, 3, 4], "next": [0, 2, 4], "hold": [0, 2], "ph": 0, "d": [0, 2, 4], "ucl": 0, "london": 0, "phil": 0, "sc": 0, "b": [0, 2, 4], "abstract": [1, 2, 4], "heavili": [1, 2, 4], "gloss": 1, "fundament": [1, 2, 4], "challeng": [1, 2, 3, 4], "convers": [1, 2, 3, 4], "thi": [1, 2, 3, 4], "book": [1, 2], "kei": [1, 4], "python": [1, 2, 3, 4], "proven": 1, "yet": [1, 2, 3], "i": [1, 2, 3, 4], "unstructur": [1, 4], "context": [1, 2, 3, 4], "code": [1, 2, 4], "sidestep": 1, "inher": [1, 2, 3, 4], "core": [1, 2], "we": [1, 2, 3, 4], "ll": [1, 2], "address": [1, 2, 3, 4], "approach": [1, 2, 3, 4], "note": [1, 2, 3, 4], "perspect": 1, "who": [1, 2, 3, 4], "For": [1, 2, 3, 4], "outcom": [1, 2, 4], "prerequisit": 1, "set": [1, 2, 3, 4], "up": [1, 2, 3, 4], "your": [1, 2, 3, 4], "environ": [1, 2, 3, 4], "setup": [1, 2, 4], "api": [1, 2], "configur": [1, 2], "repositori": [1, 2], "troubleshoot": 1, "common": [1, 2, 3, 4], "issu": [1, 2, 3, 4], "about": [1, 2, 3, 4], "author": [1, 2, 4], "": [1, 2, 3, 4], "statement": 1, "One": [1, 2], "shot": [1, 2], "json": [1, 2, 3], "mode": 1, "outlin": [1, 2], "multipl": [1, 2, 3, 4], "choic": [1, 2, 4], "pydant": [1, 2, 4], "discuss": [1, 2], "compar": [1, 2, 3], "research": [1, 2, 3], "ongo": [1, 2], "debat": 1, "conclus": [1, 2], "acknowledg": [1, 2], "refer": 1, "content": 1, "what": [1, 2, 4], "ar": [1, 2, 4], "contextu": [1, 2], "link": [1, 2], "write": [1, 2, 4], "construct": [1, 2, 4], "dynam": [1, 2], "paramet": [1, 2, 4], "report": [1, 2, 4], "usag": [1, 2, 4], "futur": [1, 2], "consider": [1, 4], "machin": 1, "temperatur": [1, 3, 4], "sampl": [1, 3, 4], "spectrum": 1, "properti": 1, "conceptu": [1, 4], "overview": [1, 4], "compon": [1, 2], "metric": 1, "evalu": [1, 3, 4], "human": [1, 3, 4], "benchmark": 1, "leaderboard": 1, "type": [1, 2, 3, 4], "detect": [1, 2, 4], "retriev": [1, 2], "augment": [1, 2], "rag": 1, "select": [1, 2], "index": [1, 2, 3], "vector": 1, "store": [1, 2, 3], "method": [1, 2, 3, 4], "pipelin": [1, 2, 4], "valid": [1, 2, 4], "guard": 1, "filter": [1, 2], "sanit": 1, "alert": 1, "cach": [1, 2], "invalid": [1, 4], "predict": [1, 2, 4], "llama": [1, 2, 4], "llamafil": 1, "ollama": 1, "migrat": 1, "commun": [1, 2, 4], "doesn": [2, 3, 4], "t": [2, 3, 4], "matter": 2, "beauti": 2, "theori": 2, "smart": 2, "agre": 2, "wrong": 2, "richard": 2, "feynman": 2, "natur": [2, 3, 4], "unlik": 2, "where": [2, 3, 4], "same": [2, 3, 4], "produc": [2, 4], "novel": 2, "text": [2, 3, 4], "train": [2, 4], "data": [2, 3, 4], "respons": [2, 3, 4], "each": [2, 3], "time": [2, 3, 4], "re": [2, 3, 4], "queri": 2, "even": [2, 3, 4], "ident": 2, "behavior": 2, "strength": 2, "ask": [2, 4], "question": [2, 4], "isn": 2, "bug": 2, "featur": [2, 4], "random": 2, "allow": [2, 3, 4], "creativ": [2, 4], "divers": [2, 3, 4], "testabl": 2, "servic": [2, 3, 4], "advic": 2, "mean": [2, 3, 4], "yield": 2, "exceedingli": 2, "regulatori": 2, "complianc": [2, 4], "guarante": [2, 4], "user": [2, 3], "trust": [2, 4], "affect": 2, "inconsist": [2, 4], "primari": 2, "determin": [2, 3, 4], "come": [2, 3, 4], "dure": [2, 4], "calcul": 2, "probabl": [2, 4], "distribut": [2, 4], "nucleu": 2, "holtzman": 2, "et": [2, 4], "al": [2, 4], "2020": 2, "top": [2, 4], "k": [2, 3, 4], "coher": [2, 3], "0": [2, 3, 4], "repetit": [2, 3, 4], "1": [2, 4], "increas": [2, 3, 4], "incoher": 2, "dotenv": [2, 3, 4], "import": [2, 3, 4], "load_dotenv": [2, 3, 4], "o": [2, 3, 4], "load": [2, 3, 4], "variabl": [2, 3, 4], "panda": 2, "pd": 2, "def": [2, 3, 4], "generate_respons": 2, "model_nam": [2, 3], "str": [2, 3, 4], "float": [2, 3], "attempt": [2, 3], "int": [2, 3], "3": [2, 4], "datafram": 2, "demonstr": [2, 3, 4], "client": [2, 4], "result": [2, 3, 4], "temp": 2, "rang": [2, 3, 4], "complet": [2, 3, 4], "messag": [2, 4], "max_token": 2, "50": 2, "append": [2, 3, 4], "displai": [2, 4], "group": [2, 3], "df_result": 2, "print": [2, 3, 4], "f": [2, 3, 4], "ntemperatur": 2, "40": 2, "temp_respons": 2, "_": 2, "row": 2, "iterrow": 2, "return": [2, 3, 4], "max_length": [2, 4], "10000": [2, 3, 4], "appl": [2, 3, 4], "sec_fil": [2, 4], "unit": [2, 3, 4], "state": [2, 3, 4], "nsecur": 2, "AND": [2, 4], "exchang": [2, 3, 4], "commiss": [2, 3, 4], "nwashington": 2, "c": [2, 4], "20549": 2, "n": [2, 3], "nform": 2, "10": [2, 3, 4], "mark": 2, "annual": 2, "pursuant": 2, "TO": 2, "section": [2, 3, 4], "13": 2, "OR": 2, "OF": 2, "THE": 2, "act": 2, "1934": 2, "nfor": 2, "fiscal": [2, 3], "end": [2, 3], "septemb": [2, 3], "28": [2, 3], "2024": [2, 3, 4], "nor": 2, "period": [2, 3], "ncommiss": 2, "number": [2, 3, 4], "001": 2, "36743": 2, "ng66145g66i43": 2, "jpg": 2, "nappl": 2, "inc": [2, 3, 4], "exact": 2, "name": [2, 3, 4], "registr": 2, "specifi": [2, 3, 4], "its": [2, 3, 4], "charter": 2, "ncalifornia": 2, "t94": 2, "2404110": 2, "jurisdict": 2, "nof": 2, "incorpor": 2, "employ": 2, "identif": 2, "No": [2, 4], "none": 2, "park": 2, "wai": [2, 3, 4], "ncupertino": 2, "california": [2, 4], "n95014": 2, "princip": 2, "offic": 2, "zip": 2, "408": 2, "996": 2, "1010": 2, "telephon": 2, "includ": [2, 3, 4], "area": [2, 4], "regist": 2, "12": [2, 3], "ntitl": 2, "class": [2, 3, 4], "ttrade": 2, "symbol": 2, "tname": 2, "ncommon": 2, "stock": [2, 4], "00001": 2, "par": 2, "valu": [2, 3, 4], "per": [2, 3], "naapl": 2, "tthe": 2, "nasdaq": [2, 4], "market": [2, 3, 4], "llc": [2, 4], "n0": 2, "000": [2, 4], "due": [2, 3], "2025": 2, "875": 2, "n1": 2, "625": 2, "2026": 2, "n2": 2, "2027": 2, "375": 2, "2029": 2, "n3": 2, "050": 2, "2031": 2, "600": 2, "2042": 2, "nindic": 2, "season": 2, "issuer": 2, "defin": [2, 3, 4], "rule": [2, 3, 4], "405": 2, "nye": 2, "whether": [2, 3, 4], "ha": [2, 4], "all": [2, 3, 4], "preced": 2, "month": 2, "shorter": 2, "wa": [2, 4], "2": [2, 4], "been": 2, "subject": 2, "past": 2, "90": 2, "dai": [2, 4], "submit": 2, "electron": 2, "regul": [2, 4], "232": 2, "chapter": 2, "acceler": 2, "filer": 2, "growth": 2, "see": [2, 4], "definit": [2, 4], "12b": 2, "nlarg": 2, "tacceler": 2, "nnon": 2, "tsmaller": 2, "nemerg": 2, "nif": 2, "indic": [2, 4], "elect": 2, "extend": [2, 4], "compli": [2, 4], "ani": [2, 3, 4], "revis": 2, "account": 2, "standard": 2, "attest": 2, "assess": [2, 3], "intern": 2, "under": [2, 4], "404": 2, "sarban": 2, "oxlei": 2, "u": [2, 4], "7262": 2, "firm": 2, "prepar": [2, 3], "audit": 2, "reflect": 2, "correct": [2, 4], "previous": [2, 3, 4], "those": [2, 3, 4], "restat": 2, "recoveri": 2, "incent": 2, "compens": 2, "receiv": [2, 3], "relev": 2, "240": 2, "10d": 2, "shell": 2, "nthe": 2, "aggreg": 2, "vote": 2, "held": [2, 4], "affili": [2, 4], "march": [2, 4], "29": [2, 4], "last": [2, 3, 4], "second": [2, 3], "quarter": 2, "approxim": [2, 4], "628": [2, 4], "553": [2, 4], "sole": 2, "purpos": [2, 4], "disclosur": 2, "director": 2, "date": [2, 4], "exclud": 2, "becaus": 2, "person": [2, 4], "deem": 2, "necessarili": 2, "n15": 2, "115": [2, 4], "823": [2, 4], "were": [2, 4], "outstand": [2, 4], "octob": [2, 4], "18": [2, 4], "ndocument": 2, "BY": 2, "nportion": 2, "proxi": 2, "relat": 2, "meet": [2, 4], "sharehold": 2, "part": [2, 3, 4], "iii": 2, "within": [2, 3, 4], "120": 2, "ntabl": 2, "npage": 2, "npart": 2, "nitem": 2, "nbusi": 2, "1a": 2, "nrisk": 2, "factor": [2, 3, 4], "n5": 2, "1b": 2, "nunresolv": 2, "staff": 2, "comment": 2, "n17": 2, "1c": 2, "ncybersecur": 2, "nproperti": 2, "n18": 2, "nlegal": 2, "proceed": 2, "4": 2, "nmine": 2, "ii": [2, 4], "5": [2, 3, 4], "nmarket": 2, "stockhold": 2, "purchas": 2, "n19": 2, "6": [2, 3, 4], "reserv": 2, "n20": 2, "7": [2, 3], "nmanag": 2, "condit": 2, "n21": 2, "7a": 2, "nquantit": 2, "qualit": 2, "n27": 2, "8": [2, 3], "nfinanci": 2, "supplementari": 2, "n28": 2, "9": 2, "nchang": 2, "disagr": 2, "n51": 2, "9a": 2, "ncontrol": 2, "procedur": 2, "9b": 2, "nother": 2, "n52": 2, "9c": 2, "ndisclosur": 2, "regard": 2, "foreign": 2, "prevent": [2, 4], "inspect": 2, "ndirector": 2, "corpor": 2, "govern": 2, "11": 2, "nexecut": 2, "ownership": 2, "certain": [2, 3, 4], "benefici": 2, "owner": 2, "ncertain": 2, "relationship": 2, "transact": 2, "independ": [2, 4], "14": [2, 4], "nprincip": 2, "fee": 2, "iv": 2, "nexhibit": 2, "schedul": 2, "n53": 2, "16": 2, "summari": [2, 4], "n56": 2, "nthi": 2, "forward": 2, "litig": 2, "reform": 2, "1995": 2, "involv": [2, 4], "uncertainti": 2, "locat": 2, "item": 2, "expect": [2, 3, 4], "event": 2, "assumpt": 2, "doe": [2, 3, 4], "directli": [2, 4], "histor": 2, "fact": 2, "macroeconom": 2, "identifi": [2, 3, 4], "word": [2, 3, 4], "anticip": 2, "believ": [2, 4], "plan": [2, 4], "would": [2, 3, 4], "term": [2, 3], "actual": [2, 3, 4], "significantli": [2, 3], "might": [2, 3, 4], "caus": 2, "assum": [2, 3], "oblig": [2, 3], "updat": [2, 3, 4], "reason": [2, 3, 4], "except": [2, 4], "law": 2, "nunless": 2, "otherwis": 2, "present": [2, 3, 4], "herein": 2, "calendar": 2, "particular": [2, 4], "associ": [2, 3, 4], "collect": [2, 3], "wholli": 2, "own": [2, 3], "subsidiari": 2, "unless": 2, "ncompani": 2, "manufactur": 2, "smartphon": 2, "tablet": 2, "wearabl": [2, 4], "accessori": 2, "sell": 2, "varieti": 2, "52": 2, "53": 2, "week": 2, "saturdai": 2, "nproduct": 2, "niphon": 2, "line": 2, "io": [2, 4], "iphon": [2, 4], "pro": [2, 3], "se": 2, "nmac": 2, "maco": 2, "mac": [2, 4], "laptop": 2, "macbook": 2, "air": 2, "desktop": 2, "imac": 2, "mini": [2, 3, 4], "studio": 2, "nipad": 2, "multipurpos": 2, "ipado": 2, "ipad": [2, 4], "nwearabl": 2, "home": 2, "smartwatch": 2, "wireless": 2, "headphon": 2, "spatial": 2, "watcho": 2, "watch": 2, "ultra": 2, "seri": 2, "airpod": 2, "max": 2, "beat": 2, "vision": 2, "visiono": 2, "nhome": 2, "tv": 2, "media": 2, "stream": [2, 4], "game": 2, "devic": [2, 4], "tvo": 2, "homepod": 2, "high": [2, 3], "fidel": 2, "naccessori": 2, "brand": 2, "third": 2, "parti": 2, "nservic": 2, "nadvertis": 2, "advertis": 2, "licens": 2, "arrang": 2, "napplecar": 2, "portfolio": [2, 4], "support": [2, 4], "applecar": 2, "prioriti": 2, "network": [2, 4], "repair": 2, "replac": 2, "case": [2, 3, 4], "addit": [2, 3, 4], "coverag": 2, "instanc": [2, 3], "accident": 2, "damag": 2, "theft": 2, "loss": 2, "countri": 2, "ncloud": 2, "keep": [2, 3], "custom": 2, "avail": [2, 3, 4], "ndigit": 2, "variou": [2, 3, 4], "app": 2, "discov": 2, "download": 2, "digit": 2, "music": 2, "video": 2, "podcast": 2, "subscript": 2, "arcad": 2, "fit": [2, 3, 4], "sm": 2, "curat": 2, "listen": 2, "demand": [2, 4], "radio": 2, "station": 2, "magazin": 2, "exclus": 2, "origin": [2, 3, 4], "live": 2, "sport": 2, "npayment": 2, "payment": 2, "card": 2, "co": 2, "credit": 2, "pai": 2, "cashless": 2, "nsegment": 2, "primarili": 2, "geograph": 2, "basi": 2, "segment": [2, 3, 4], "america": 2, "europ": 2, "greater": 2, "china": 2, "japan": 2, "rest": 2, "asia": 2, "pacif": 2, "north": 2, "south": 2, "european": 2, "india": 2, "middl": 2, "east": 2, "africa": 2, "mainland": 2, "hong": 2, "kong": 2, "taiwan": 2, "australia": 2, "asian": 2, "although": 2, "hardwar": 2, "one": [2, 3, 4], "separ": [2, 3], "align": [2, 3, 4], "partner": 2, "region": 2, "consum": [2, 4], "small": [2, 4], "mid": [2, 3], "educ": [2, 3], "enterpris": [2, 4], "resel": 2, "retail": 2, "onlin": 2, "direct": 2, "sale": 2, "emploi": [2, 4], "indirect": 2, "channel": 2, "cellular": 2, "carrier": 2, "net": [2, 4], "38": 2, "62": 2, "respect": 2, "total": [2, 3, 4], "ncompetit": 2, "highli": [2, 4], "competit": 2, "character": 2, "aggress": 2, "price": 2, "downward": 2, "pressur": 2, "gross": 2, "margin": [2, 4], "introduct": [2, 3], "short": [2, 3, 4], "life": 2, "cycl": 2, "evolv": [2, 3], "industri": [2, 4], "continu": [2, 3, 4], "improv": [2, 3, 4], "characterist": 2, "rapid": 2, "adopt": [2, 4], "advanc": [2, 3, 4], "competitor": 2, "compet": 2, "veri": 2, "low": [2, 4], "imit": 2, "infring": 2, "intellectu": 2, "abil": [2, 4], "successfulli": [2, 4], "innov": [2, 3], "marketplac": 2, "nearli": 2, "rel": 2, "qualiti": [2, 3, 4], "strong": [2, 4], "ecosystem": 2, "reput": 2, "expand": 2, "opportun": 2, "substanti": 2, "establish": 2, "some": [2, 3, 4], "broader": 2, "lower": [2, 4], "particularli": [2, 3, 4], "intens": [2, 4], "cut": [2, 3], "littl": 2, "free": 2, "illegitim": 2, "obtain": [2, 4], "collabor": 2, "nsuppli": 2, "nalthough": 2, "essenti": [2, 3, 4], "singl": [2, 3, 4], "particip": 2, "therefor": 2, "wide": [2, 3, 4], "shortag": 2, "commod": 2, "fluctuat": 2, "commonli": 2, "introduc": [2, 3, 4], "often": [2, 3, 4], "util": [2, 3], "onli": [2, 3, 4], "capac": 2, "until": [2, 4], "supplier": 2, "matur": 2, "accept": 2, "decid": [2, 3], "concentr": 2, "instead": [2, 3, 4], "enter": 2, "agreement": 2, "suppli": [2, 4], "renew": 2, "nresearch": 2, "nbecaus": 2, "upon": [2, 3], "flow": [2, 3], "enhanc": [2, 3, 4], "acquisit": 2, "nintellectu": 2, "broad": [2, 4], "right": 2, "aspect": [2, 3, 4], "patent": 2, "copyright": 2, "trademark": 2, "trade": [2, 4], "secret": 2, "differenti": 2, "success": [2, 4], "reli": 2, "skill": 2, "personnel": 2, "regularli": 2, "protect": 2, "aris": 2, "pursu": 2, "thousand": 2, "accumul": 2, "durat": 2, "adequ": 2, "nin": 2, "necessari": [2, 3], "process": [2, 3, 4], "commerci": [2, 4], "experienc": 2, "higher": 2, "holidai": 2, "addition": 2, "expens": 2, "fill": 2, "inventori": 2, "launch": 2, "older": 2, "declin": 2, "newer": 2, "distributor": 2, "nhuman": 2, "capit": [2, 3, 4], "peopl": 2, "plai": [2, 4], "strive": 2, "attract": 2, "retain": [2, 3], "talent": 2, "inclus": [2, 3, 4], "team": [2, 4], "member": 2, "so": [2, 4], "As": [2, 3, 4], "had": 2, "164": 2, "full": [2, 3, 4], "equival": 2, "employe": 2, "ncompens": 2, "benefit": [2, 4], "equit": 2, "recogn": 2, "thrive": [2, 4], "succe": 2, "profession": [2, 4], "health": 2, "awai": 2, "ngrowth": 2, "achiev": [2, 4], "career": 2, "leadership": 2, "influenc": [2, 4], "cultur": 2, "advantag": [2, 3, 4], "being": [2, 4], "nworkplac": 2, "practic": [2, 3], "polici": 2, "equal": 2, "workplac": 2, "harass": 2, "discrimin": 2, "ninclus": 2, "sustain": 2, "workforc": 2, "repres": [2, 4], "serv": [2, 3, 4], "represent": [2, 3], "level": [2, 3, 4], "foster": [2, 4], "nengag": 2, "honest": 2, "among": 2, "everyon": 2, "grow": [2, 4], "encourag": [2, 4], "feedback": [2, 4], "concern": 2, "conduct": 2, "survei": [2, 4], "gaug": 2, "sentiment": [2, 4], "nhealth": 2, "everywher": 2, "measur": 2, "mitig": [2, 3, 4], "possibl": [2, 4], "hazard": 2, "crisi": 2, "put": 2, "place": [2, 4], "visitor": 2, "navail": 2, "quarterli": 2, "q": 2, "amend": 2, "sec": [2, 3, 4], "Such": 2, "charg": 2, "investor": [2, 4], "default": [2, 4], "aspx": 2, "websit": 2, "www": 2, "press": 2, "releas": [2, 4], "environment": 2, "social": 2, "detail": [2, 3, 4], "referenc": 2, "further": [2, 3, 4], "url": [2, 4], "inact": 2, "textual": 2, "unknown": 2, "describ": 2, "below": [2, 3, 4], "materi": [2, 4], "advers": 2, "trend": [2, 4], "conjunct": 2, "consolid": 2, "accompani": 2, "nmacroeconom": 2, "econom": 2, "outsid": 2, "chain": [2, 3], "facil": 2, "assembli": 2, "site": 2, "nadvers": 2, "slow": 2, "recess": 2, "unemploy": 2, "inflat": 2, "tighter": 2, "interest": [2, 3, 4], "currenc": 2, "confid": [2, 4], "spend": 2, "chang": 2, "monetari": 2, "volatil": 2, "incom": 2, "asset": 2, "contract": 2, "logist": 2, "instabl": 2, "inabl": 2, "financ": 2, "insolv": 2, "failur": 2, "deriv": 2, "counterparti": 2, "debt": 2, "reduc": [2, 3, 4], "liquid": [2, 3], "fair": 2, "instrument": 2, "polit": 2, "disput": 2, "geopolit": 2, "tension": 2, "terror": 2, "disast": 2, "accid": 2, "interrupt": 2, "npolit": 2, "whole": 2, "outsourc": 2, "korea": 2, "vietnam": 2, "restrict": [2, 4], "tariff": 2, "export": 2, "good": [2, 4], "portion": 2, "revenu": [2, 3, 4], "raw": [2, 4], "go": [2, 3, 4], "action": [2, 3], "restructur": 2, "ceas": 2, "accord": 2, "disrupt": [2, 3], "announc": 2, "notic": [2, 4], "led": [2, 4], "escal": [2, 3], "sever": [2, 3, 4], "nmani": 2, "prone": 2, "earthquak": 2, "climat": 2, "weather": 2, "occur": 2, "fire": 2, "nuclear": 2, "plant": 2, "terrorist": 2, "attack": 2, "hostil": 2, "ransomwar": 2, "cybersecur": 2, "labor": 2, "beyond": 2, "nsuch": 2, "imposs": 2, "delai": 2, "ineffici": 2, "slowdown": 2, "outag": 2, "neg": [2, 4], "seriou": 2, "injuri": 2, "pandem": 2, "covid": 2, "19": 2, "economi": 2, "imposit": 2, "stringent": 2, "travel": 2, "freight": 2, "movement": 2, "ramp": 2, "nfollow": 2, "expenditur": 2, "resum": 2, "lose": 2, "exacerb": 2, "consequ": 2, "insur": 2, "insuffici": 2, "nglobal": 2, "unabl": 2, "There": [2, 3, 4], "assur": 2, "contrast": 2, "minor": 2, "overal": [2, 3, 4], "naddition": 2, "intensifi": 2, "seamlessli": [2, 3], "function": [2, 3, 4], "nto": 2, "remain": [2, 3], "stimul": 2, "ndue": 2, "upgrad": 2, "appropri": [2, 3, 4], "quantiti": 2, "defect": 2, "defici": 2, "supersed": 2, "nsubstanti": 2, "much": 2, "transport": 2, "diminish": 2, "flexibl": [2, 3, 4], "respond": 2, "provis": 2, "reimburs": 2, "warranti": 2, "out": [2, 3], "unanticip": 2, "liabil": 2, "adher": [2, 3, 4], "violat": 2, "final": [2, 3, 4], "finish": 2, "destin": 2, "man": 2, "made": [2, 3, 4], "prepay": 2, "termin": 2, "recover": 2, "exposur": 2, "nfutur": 2, "suffici": [2, 4], "semiconductor": 2, "suffer": 2, "poor": 2, "constrain": [2, 3, 4], "shipment": 2, "altern": [2, 3], "sophist": [2, 3], "unexpectedli": 2, "interfer": 2, "unsaf": 2, "artifici": 2, "intellig": 2, "expos": 2, "inaccur": [2, 4], "fix": [2, 3], "widespread": 2, "vulner": 2, "exploit": 2, "compromis": 2, "claim": 2, "recal": 2, "modif": 2, "off": [2, 3, 4], "intang": 2, "fine": [2, 4], "lost": [2, 3], "cancel": 2, "record": 2, "obsolet": 2, "exce": 2, "realiz": 2, "accru": 2, "excess": 2, "review": [2, 4], "impair": 2, "whenev": 2, "circumst": 2, "amount": [2, 3, 4], "carri": [2, 4], "incur": 2, "given": [2, 3, 4], "unpredict": [2, 4], "pace": 2, "obsolesc": 2, "forecast": 2, "150": 2, "incorrectli": [2, 4], "fulli": [2, 3], "extens": [2, 3, 4], "issuanc": 2, "unknowingli": 2, "notifi": 2, "preclud": 2, "choos": 2, "bui": 2, "percept": 2, "android": 2, "playstat": 2, "nintendo": 2, "xbox": 2, "posit": [2, 3, 4], "less": 2, "inclin": 2, "devot": 2, "compel": [2, 4], "fail": 2, "dissatisfi": 2, "vast": 2, "legal": 2, "storefront": 2, "mechan": 2, "safari": 2, "union": 2, "eu": 2, "dma": 2, "interfac": 2, "reduct": 2, "narrow": 2, "scope": [2, 3], "elimin": 2, "nfailur": 2, "appeal": 2, "subscrib": 2, "nsome": 2, "manner": [2, 3, 4], "nurtur": 2, "distinct": 2, "nmuch": 2, "chief": 2, "especi": [2, 3, 4], "silicon": 2, "vallei": 2, "constantli": 2, "driver": 2, "recruit": 2, "subsidi": 2, "staf": 2, "contractor": 2, "placement": 2, "increment": 2, "weaken": 2, "stop": [2, 3], "telecommun": 2, "war": 2, "virus": 2, "physic": 2, "ins": 2, "incid": 2, "redund": 2, "ineffect": 2, "inadequ": 2, "eventu": 2, "thing": [2, 4], "interf": 2, "imped": 2, "ship": 2, "nloss": 2, "unauthor": 2, "confidenti": 2, "encrypt": 2, "But": [2, 4], "absolut": [2, 4], "malici": 2, "behalf": 2, "gain": 2, "regular": [2, 4], "normal": [2, 4], "investig": 2, "penalti": 2, "judgment": 2, "against": 2, "frequenc": [2, 3], "actor": 2, "circumv": [2, 3], "remov": 2, "obfusc": 2, "forens": 2, "evid": [2, 4], "hinder": [2, 4], "recov": 2, "perpetr": 2, "target": [2, 4], "profil": 2, "authent": 2, "hack": 2, "malfeas": 2, "faulti": 2, "password": 2, "irregular": 2, "fraudul": 2, "induc": 2, "disclos": [2, 3, 4], "usernam": 2, "turn": 2, "multifactor": 2, "unusu": 2, "freez": 2, "suspici": 2, "nwhile": 2, "ninvest": 2, "contempl": 2, "endeavor": 2, "distract": 2, "tangibl": 2, "approv": 2, "oner": 2, "ventur": 2, "riski": 2, "pose": [2, 3, 4], "leas": 2, "unfavor": 2, "arisen": 2, "ordinari": 2, "cours": 2, "resolv": 2, "sometim": [2, 4], "indemnif": 2, "indemnifi": 2, "alleg": 2, "magnitud": 2, "assert": 2, "royalti": 2, "vigor": 2, "defend": 2, "court": 2, "internation": 2, "plaintiff": 2, "injunct": 2, "relief": 2, "nregardless": 2, "merit": 2, "recognit": 2, "settl": 2, "uncertain": 2, "abov": 2, "disgorg": 2, "remedi": 2, "worldwid": 2, "antitrust": 2, "privaci": [2, 4], "local": [2, 3, 4], "bill": 2, "commerc": 2, "internet": 2, "mobil": [2, 4], "televis": 2, "film": 2, "anticorrupt": 2, "cash": [2, 3], "repatri": 2, "monei": 2, "launder": 2, "tax": 2, "wast": 2, "recycl": 2, "ncomplianc": 2, "impos": [2, 4], "interpret": 2, "ethic": 2, "agent": 2, "found": [2, 4], "nregulatori": 2, "satisfi": 2, "ban": 2, "nexpect": 2, "stakehold": 2, "increasingli": [2, 4], "greenhous": 2, "ga": 2, "emiss": 2, "civil": 2, "disagre": 2, "perceiv": 2, "feder": 2, "vari": 2, "scrutini": 2, "nfrom": 2, "taken": [2, 4], "engag": [2, 4], "noncompli": 2, "individu": [2, 3], "lawsuit": 2, "monopol": 2, "nfurther": 2, "earn": 2, "googl": [2, 4], "search": 2, "nthere": 2, "connect": [2, 4], "retent": 2, "transfer": 2, "pass": [2, 4], "pend": 2, "inquiri": 2, "government": 2, "entiti": [2, 4], "biometr": 2, "breach": 2, "notif": 2, "permit": 2, "healthcar": 2, "liabl": 2, "investigatori": 2, "cardhold": 2, "compress": [2, 3], "acquir": 2, "shift": 2, "mix": [2, 4], "extent": 2, "unexpect": [2, 4], "dollar": 2, "denomin": 2, "rais": [2, 3], "offset": 2, "strengthen": 2, "nconvers": 2, "therebi": [2, 3], "thu": 2, "option": [2, 3, 4], "hedg": 2, "deterior": 2, "sovereign": 2, "heighten": 2, "worsen": 2, "A": [2, 3, 4], "collater": 2, "bank": 2, "unsecur": 2, "subassembli": 2, "assembl": 2, "few": [2, 3, 4], "legisl": 2, "ireland": 2, "singapor": 2, "organis": 2, "propos": 2, "modern": [2, 3, 4], "minimum": 2, "statutori": 2, "valuat": 2, "defer": 2, "bodi": 2, "likelihood": 2, "adequaci": 2, "ultim": 2, "ow": 2, "ngener": 2, "volum": [2, 3], "unrel": 2, "averag": 2, "repurchas": 2, "point": [2, 3], "dividend": 2, "consumm": 2, "declar": 2, "board": 2, "unresolv": 2, "nnone": 2, "threat": 2, "dedic": [2, 4], "postur": 2, "25": 2, "sinc": [2, 3, 4], "2016": 2, "coordin": 2, "assist": [2, 4], "log": 2, "track": 2, "committe": 2, "oversight": 2, "counsel": 2, "chair": 2, "substanc": 2, "17": 2, "headquart": 2, "cupertino": [2, 4], "land": 2, "center": [2, 4], "suitabl": 2, "formal": [2, 4], "articl": [2, 3], "promot": 2, "conclud": 2, "uninstal": 2, "web": 2, "browser": 2, "screen": 2, "june": 2, "24": [2, 4], "preliminari": 2, "find": [2, 3, 4], "contractu": 2, "desist": 2, "stai": [2, 3], "grant": 2, "ndepart": 2, "justic": 2, "21": 2, "depart": 2, "doj": 2, "district": 2, "attornei": 2, "jersei": 2, "redress": 2, "anticompetit": 2, "nonmonetari": 2, "defens": 2, "itself": 2, "nepic": 2, "epic": 2, "northern": 2, "unfair": 2, "guidelin": 2, "enjoin": 2, "extern": 2, "januari": 2, "motion": 2, "enforc": [2, 4], "oppos": 2, "30": 2, "vacat": 2, "fourth": 2, "did": [2, 4], "mine": 2, "nnot": 2, "aapl": 2, "nholder": 2, "na": 2, "23": 2, "301": 2, "npurchas": 2, "nshare": 2, "three": 2, "million": 2, "nperiod": 2, "ttotal": 2, "taverag": 2, "npaid": 2, "publicli": [2, 4], "nannounc": 2, "napproxim": 2, "That": [2, 4], "Be": 2, "nunder": 2, "njune": 2, "august": 2, "nopen": 2, "negoti": 2, "t35": 2, "697": 2, "t224": 2, "naugust": 2, "31": 2, "t42": 2, "910": 2, "t221": 2, "39": 2, "nseptemb": 2, "t33": 2, "653": 2, "t222": 2, "86": 2, "ntotal": 2, "t112": 2, "260": 2, "t89": 2, "074": 2, "110": 2, "billion": 2, "20": [2, 4], "previou": [2, 3, 4], "2023": 2, "10b5": 2, "graph": 2, "show": [2, 3, 4], "comparison": 2, "five": 2, "cumul": 2, "reinvest": 2, "p": 2, "dow": 2, "jone": 2, "supersector": 2, "100": [2, 4], "close": 2, "27": 2, "2019": 2, "n2218": 2, "tseptemb": 2, "2021": 2, "2022": 2, "t100": 2, "t207": 2, "t273": 2, "t281": 2, "t322": 2, "t430": 2, "t113": 2, "t156": 2, "t131": 2, "t155": 2, "t210": 2, "ndow": 2, "t146": 2, "t216": 2, "t215": 2, "nfirst": 2, "nsecond": 2, "nthird": 2, "sequoia": 2, "nfourth": 2, "plu": 2, "nfiscal": 2, "six": 2, "realign": 2, "span": 2, "wherea": 2, "indirectli": 2, "tabl": [2, 3, 4], "n2024": 2, "tchang": 2, "t2023": 2, "t2022": 2, "namerica": 2, "t167": 2, "045": 2, "t3": 2, "t162": 2, "560": 2, "t169": 2, "658": 2, "neurop": 2, "t101": 2, "328": 2, "t7": 2, "294": 2, "t95": 2, "118": 2, "ngreater": 2, "t66": 2, "952": 2, "t72": 2, "559": 2, "t74": 2, "200": 2, "njapan": 2, "t25": 2, "052": 2, "t24": 2, "257": 2, "977": 2, "nrest": 2, "t30": 2, "t4": 2, "t29": 2, "615": 2, "t1": 2, "t391": 2, "035": 2, "t2": 2, "t383": 2, "285": 2, "t394": 2, "decreas": 2, "weak": 2, "renminbi": 2, "yen": [2, 4], "22": 2, "categori": 2, "t201": 2, "183": 2, "t200": 2, "583": 2, "t205": 2, "489": 2, "984": 2, "357": 2, "t40": 2, "177": 2, "t26": 2, "694": 2, "t28": 2, "300": [2, 3], "292": 2, "t37": 2, "005": 2, "t39": 2, "845": 2, "t41": 2, "241": 2, "n96": 2, "169": 2, "t13": 2, "t85": 2, "t9": 2, "t78": 2, "129": 2, "amort": 2, "bundl": 2, "flat": 2, "entri": 2, "partial": [2, 3], "ngross": 2, "percentag": 2, "t109": 2, "633": 2, "t108": 2, "803": 2, "t114": 2, "728": 2, "t71": 2, "t60": 2, "345": 2, "t56": 2, "054": 2, "t180": 2, "683": 2, "148": 2, "t170": 2, "782": 2, "t36": 2, "t73": 2, "t70": 2, "t46": 2, "t44": 2, "t43": 2, "save": [2, 3], "noper": 2, "t31": 2, "370": 2, "t5": 2, "915": 2, "t14": 2, "251": 2, "npercentag": 2, "t8": 2, "nsell": 2, "administr": 2, "097": 2, "932": 2, "094": 2, "t6": 2, "t57": 2, "467": 2, "t54": 2, "847": 2, "t51": 2, "t15": 2, "driven": 2, "headcount": 2, "nprovis": 2, "749": 2, "t16": 2, "741": 2, "t19": 2, "neffect": 2, "nstatutori": 2, "t21": 2, "aid": 2, "nliquid": 2, "unrestrict": 2, "140": 2, "ndebt": 2, "97": 2, "payabl": 2, "promissori": 2, "paper": [2, 4], "nleas": 2, "space": 2, "nmanufactur": 2, "noncancel": 2, "ndeem": 2, "2017": 2, "tcja": 2, "paid": 2, "nstate": 2, "fund": 2, "escrow": 2, "ncapit": 2, "95": 2, "nrecent": 2, "pronounc": 2, "nincom": 2, "decemb": 2, "fasb": 2, "asu": 2, "09": [2, 3], "topic": [2, 3, 4], "740": 2, "reconcili": 2, "reconcil": [2, 4], "quantit": 2, "threshold": 2, "disaggreg": 2, "prospect": 2, "novemb": 2, "07": [2, 3, 4], "280": 2, "maker": 2, "codm": 2, "titl": 2, "alloc": 2, "retrospect": 2, "ncritic": 2, "conform": [2, 4], "principl": 2, "gaap": 2, "nuncertain": 2, "domest": 2, "taxat": 2, "adjust": [2, 3, 4], "resolut": 2, "conting": 2, "26": 2, "still": 2, "ninterest": 2, "forth": 2, "hypothet": 2, "nsensit": 2, "nhypothet": 2, "nrate": 2, "npotenti": 2, "n100": 2, "tenor": 2, "ndeclin": 2, "755": 2, "089": 2, "nterm": 2, "nincreas": 2, "t139": 2, "t194": 2, "nforeign": 2, "express": [2, 4], "var": 2, "mont": 2, "carlo": 2, "simul": [2, 4], "maximum": [2, 3], "interv": 2, "538": 2, "669": 2, "underli": [2, 4], "nindex": 2, "tpage": 2, "nconsolid": 2, "n29": 2, "n30": 2, "sheet": 2, "n31": 2, "n32": 2, "n33": 2, "nnote": 2, "n34": 2, "nreport": 2, "n48": 2, "nall": 2, "omit": [2, 4], "submiss": 2, "nyear": 2, "n2023": 2, "n2022": 2, "nnet": 2, "t294": 2, "866": 2, "t298": 2, "085": 2, "t316": 2, "199": 2, "t96": 2, "ncost": 2, "t185": 2, "233": 2, "t189": 2, "282": 2, "471": 2, "119": 2, "855": 2, "t22": 2, "075": 2, "352": 2, "t214": 2, "137": 2, "t223": 2, "546": 2, "t123": 2, "216": 2, "t119": 2, "437": 2, "t269": 2, "565": 2, "334": 2, "485": 2, "736": 2, "103": 2, "t93": 2, "995": 2, "t99": 2, "nearn": 2, "nbasic": 2, "ndilut": 2, "08": [2, 4], "343": 2, "783": 2, "744": 2, "231": 2, "215": 2, "963": 2, "095": 2, "812": 2, "547": 2, "325": 2, "819": 2, "nsee": 2, "translat": 2, "t395": 2, "765": 2, "511": 2, "unreal": 2, "832": 2, "t323": 2, "212": 2, "nadjust": 2, "337": 2, "717": 2, "394": 2, "138": 2, "850": 2, "563": 2, "104": 2, "t204": 2, "t253": 2, "816": 2, "899": 2, "272": 2, "t98": 2, "016": 2, "652": 2, "t88": 2, "531": 2, "nasset": 2, "ncurrent": 2, "ncash": 2, "943": 2, "965": 2, "228": 2, "590": 2, "naccount": 2, "410": 2, "508": 2, "nvendor": 2, "t32": 2, "833": 2, "477": 2, "ninventori": 2, "286": 2, "331": 2, "287": 2, "695": 2, "t152": 2, "987": 2, "t143": 2, "566": 2, "t91": 2, "479": 2, "544": 2, "t45": 2, "680": 2, "715": 2, "834": 2, "t64": 2, "758": 2, "t211": 2, "993": 2, "t209": 2, "017": 2, "t364": 2, "980": 2, "t352": 2, "nliabil": 2, "t68": 2, "960": 2, "t62": 2, "611": 2, "304": 2, "t58": 2, "829": 2, "ndefer": 2, "249": 2, "061": 2, "ncommerci": 2, "967": 2, "985": 2, "t10": 2, "912": 2, "822": 2, "t176": 2, "392": 2, "t145": 2, "308": 2, "750": 2, "281": 2, "888": 2, "t49": 2, "848": 2, "638": 2, "t308": 2, "030": 2, "t290": 2, "ncommit": 2, "nsharehold": 2, "400": 2, "116": 2, "786": 2, "550": 2, "n83": 2, "276": 2, "naccumul": 2, "deficit": 2, "154": 2, "214": 2, "172": 2, "452": 2, "950": 2, "146": 2, "t50": 2, "672": 2, "t63": 2, "090": 2, "nbegin": 2, "849": 2, "365": 2, "423": 2, "346": 2, "175": 2, "withheld": 2, "settlement": 2, "award": 2, "521": 2, "971": 2, "t12": 2, "034": 2, "t11": 2, "nend": 2, "t83": 2, "nretain": 2, "068": 2, "562": 2, "ndividend": 2, "218": 2, "793": 2, "612": 2, "099": 2, "454": 2, "846": 2, "77": 2, "046": 2, "186": 2, "109": 2, "t163": 2, "rsu": 2, "t0": 2, "98": 2, "94": 2, "32": 2, "737": 2, "929": 2, "ndepreci": 2, "445": 2, "519": 2, "688": 2, "038": 2, "266": 2, "227": 2, "006": 2, "788": 2, "356": 2, "271": 2, "520": 2, "618": 2, "484": 2, "731": 2, "684": 2, "499": 2, "020": 2, "889": 2, "448": 2, "552": 2, "031": 2, "t118": 2, "254": 2, "t110": 2, "543": 2, "t122": 2, "151": 2, "48": 2, "656": 2, "513": 2, "76": 2, "923": 2, "nproce": 2, "211": 2, "686": 2, "917": 2, "135": 2, "828": 2, "446": 2, "447": 2, "959": 2, "708": 2, "086": 2, "935": 2, "705": 2, "354": 2, "nfinanc": 2, "441": 2, "431": 2, "223": 2, "234": 2, "025": 2, "841": 2, "nrepurchas": 2, "949": 2, "89": 2, "402": 2, "465": 2, "nrepay": 2, "958": 2, "repay": 2, "978": 2, "955": 2, "361": 2, "581": 2, "160": 2, "121": 2, "983": 2, "108": 2, "488": 2, "794": 2, "760": 2, "nsupplement": 2, "102": 2, "t18": 2, "679": 2, "573": 2, "33": 2, "nbasi": 2, "prior": 2, "reclassifi": 2, "nrevenu": 2, "remit": 2, "straight": 2, "vest": 2, "treat": 2, "sold": 2, "nderiv": 2, "combin": [2, 3, 4], "nonleas": 2, "34": 2, "entitl": 2, "reward": 2, "commenc": 2, "deliveri": 2, "stand": 2, "alon": 2, "ssp": 2, "object": [2, 4], "icloud": 2, "siri": 2, "map": [2, 4], "discount": 2, "lack": [2, 4], "undeliv": 2, "unbil": 2, "accordingli": 2, "n26": 2, "n37": 2, "35": 2, "proport": 2, "moder": 2, "64": 2, "dilut": 2, "nnumer": 2, "ndenomin": 2, "nweight": 2, "312": 2, "316": 2, "856": 2, "antidilut": 2, "tunreal": 2, "ngain": 2, "tfair": 2, "nvalu": 2, "tcash": 2, "nequival": 2, "tcurrent": 2, "tnon": 2, "t27": 2, "nlevel": 2, "nmonei": 2, "t778": 2, "nmutual": 2, "n515": 2, "t105": 2, "t617": 2, "nsubtot": 2, "293": 2, "395": 2, "nu": 2, "treasuri": 2, "516": 2, "t212": 2, "087": 2, "380": 2, "agenc": 2, "159": 2, "t703": 2, "t17": 2, "568": 2, "158": 2, "810": 2, "ncertif": 2, "deposit": 2, "t873": 2, "t387": 2, "t478": 2, "066": 2, "ncorpor": 2, "t65": 2, "622": 2, "t270": 2, "953": 2, "939": 2, "027": 2, "t47": 2, "886": 2, "nmunicip": 2, "t412": 2, "t405": 2, "t190": 2, "nmortgag": 2, "595": 2, "t175": 2, "403": 2, "t23": 2, "367": 2, "278": 2, "t132": 2, "t583": 2, "635": 2, "t128": 2, "056": 2, "966": 2, "t34": 2, "t160": 2, "t688": 2, "650": 2, "36": 2, "359": 2, "t481": 2, "n442": 2, "t428": 2, "t923": 2, "t909": 2, "406": 2, "114": 2, "468": 2, "136": 2, "t271": 2, "533": 2, "048": 2, "491": 2, "332": 2, "t320": 2, "t608": 2, "t76": 2, "840": 2, "956": 2, "890": 2, "t20": 2, "627": 2, "243": 2, "t628": 2, "t602": 2, "t192": 2, "t410": 2, "735": 2, "636": 2, "t344": 2, "t144": 2, "470": 2, "657": 2, "831": 2, "125": 2, "162": 2, "t173": 2, "752": 2, "quot": 2, "corrobor": 2, "mortgag": 2, "classifi": 2, "37": 2, "cross": 2, "swap": 2, "remeasur": 2, "notion": 2, "069": 2, "730": 2, "575": 2, "493": 2, "t104": 2, "777": 2, "nhedg": 2, "433": 2, "505": 2, "247": 2, "ntrade": 2, "41": 2, "44": 2, "depreci": 2, "nland": 2, "690": 2, "nmachineri": 2, "t80": 2, "205": 2, "314": 2, "nleasehold": 2, "839": 2, "128": 2, "599": 2, "73": 2, "70": 2, "884": 2, "852": 2, "t55": 2, "335": 2, "906": 2, "601": 2, "703": 2, "010": 2, "457": 2, "634": 2, "391": 2, "neuropean": 2, "opinion": 2, "1991": 2, "2007": 2, "irish": 2, "branch": 2, "2003": 2, "2014": 2, "2015": 2, "request": [2, 3, 4], "minist": 2, "juli": 2, "annul": 2, "ecj": 2, "hear": 2, "asid": 2, "confirm": 2, "via": [2, 4], "unrecogn": 2, "nfeder": 2, "571": 2, "080": 2, "644": 2, "265": 2, "801": 2, "726": 2, "570": 2, "298": 2, "49": 2, "t84": 2, "428": 2, "603": 2, "483": 2, "t347": 2, "t669": 2, "076": 2, "830": 2, "419": 2, "072": 2, "pretax": 2, "72": 2, "71": 2, "ncomput": 2, "885": 2, "012": 2, "124": 2, "518": 2, "nimpact": 2, "n10": 2, "246": 2, "311": 2, "366": 2, "397": 2, "153": 2, "nexcess": 2, "893": 2, "871": 2, "192": 2, "739": 2, "ntax": 2, "carryforward": 2, "302": 2, "naccru": 2, "413": 2, "421": 2, "nunreal": 2, "173": 2, "168": 2, "873": 2, "743": 2, "nless": 2, "374": 2, "007": 2, "369": 2, "551": 2, "998": 2, "nright": 2, "179": 2, "nminimum": 2, "674": 2, "940": 2, "t511": 2, "t455": 2, "t490": 2, "805": 2, "202": 2, "indefinit": 2, "temporari": 2, "727": 2, "044": 2, "284": 2, "ndecreas": 2, "386": 2, "463": 2, "982": 2, "542": 2, "936": 2, "070": 2, "expir": 2, "statut": 2, "229": 2, "494": 2, "closur": 2, "intercompani": 2, "exceed": 2, "multiyear": 2, "exercis": 2, "noncash": 2, "rou": 2, "tfinanci": 2, "t2024": 2, "tother": 2, "661": 2, "tproperti": 2, "015": 2, "303": 2, "676": 2, "t165": 2, "t752": 2, "t859": 2, "430": 2, "842": 2, "tfinanc": 2, "n2025": 2, "820": 2, "t171": 2, "991": 2, "n2026": 2, "914": 2, "n2027": 2, "t59": 2, "733": 2, "n2028": 2, "360": 2, "t38": 2, "398": 2, "n2029": 2, "187": 2, "nthereaft": 2, "t837": 2, "undiscount": 2, "790": 2, "imput": 2, "376": 2, "534": 2, "t896": 2, "weight": 2, "borrow": 2, "implicit": 2, "readili": 2, "42": 2, "proce": 2, "nine": 2, "00": 2, "nmatur": 2, "333": 2, "264": 2, "948": 2, "645": 2, "309": 2, "arrear": 2, "namount": 2, "n2013": 2, "nfix": 2, "2062": 2, "t97": 2, "341": 2, "03": 2, "65": 2, "t106": 2, "572": 2, "n97": 2, "nunamort": 2, "premium": 2, "321": 2, "358": 2, "113": 2, "662": 2, "convert": [2, 4], "930": 2, "342": 2, "800": 2, "180": 2, "43": 2, "88": 2, "ndure": 2, "425": 2, "426": 2, "372": 2, "589": 2, "055": 2, "appreci": 2, "four": 2, "holder": 2, "n2014": 2, "bonu": 2, "nrestrict": 2, "nnumber": 2, "nrsu": 2, "ngrant": 2, "naggreg": 2, "nfair": 2, "nbalanc": 2, "t240": 2, "427": 2, "t75": 2, "t150": 2, "861": 2, "501": 2, "768": 2, "87": 2, "101": 2, "878": 2, "144": 2, "t127": 2, "t135": 2, "91": 2, "456": 2, "78": 2, "59": 2, "t140": 2, "80": 2, "326": 2, "t158": 2, "204": 2, "350": 2, "002": [2, 3], "nuncondit": 2, "uncondit": 2, "206": 2, "440": 2, "156": 2, "t633": 2, "t670": 2, "226": 2, "45": 2, "nconting": 2, "least": 2, "accrual": 2, "nconcentr": 2, "attribut": [2, 4], "46": 2, "t67": 2, "098": 2, "082": 2, "062": 2, "569": 2, "895": 2, "458": 2, "207": 2, "nonrecur": 2, "t142": 2, "196": 2, "t138": 2, "t147": 2, "859": 2, "nchina": 2, "n66": 2, "t181": 2, "887": 2, "t172": 2, "269": 2, "nlong": 2, "664": 2, "n4": 2, "797": 2, "778": 2, "219": 2, "47": 2, "nopinion": 2, "nwe": 2, "fairli": 2, "pcaob": 2, "criteria": 2, "sponsor": 2, "treadwai": 2, "2013": 2, "unqualifi": 2, "thereon": 2, "nthese": 2, "misstat": 2, "fraud": 2, "alter": 2, "ndescript": 2, "naudit": 2, "nhow": 2, "nmatter": 2, "qualifi": 2, "letter": 2, "advisor": 2, "ernst": 2, "young": 2, "llp": 2, "auditor": 2, "2009": 2, "nsan": 2, "jose": 2, "nnovemb": 2, "coso": 2, "nour": 2, "ndefinit": 2, "pertain": 2, "mainten": 2, "accur": [2, 4], "disposit": 2, "receipt": 2, "degre": 2, "nevalu": 2, "nbase": 2, "supervis": 2, "13a": 2, "15d": 2, "summar": [2, 3], "ninher": 2, "met": 2, "appear": [2, 4], "paragraph": 2, "51": [2, 4], "ninsid": 2, "deirdr": 2, "brien": 2, "vice": 2, "presid": 2, "affirm": 2, "april": 2, "withhold": 2, "remitt": 2, "jeff": 2, "william": 2, "mr": 2, "insid": 2, "copi": [2, 3], "exhibit": 2, "solicit": 2, "document": [2, 3, 4], "id": 2, "00042": 2, "nincorpor": 2, "texhibit": 2, "descript": [2, 4], "tform": 2, "tfile": 2, "nrestat": 2, "n8": 2, "namend": 2, "bylaw": 2, "nindentur": 2, "york": [2, 4], "mellon": 2, "truste": 2, "noffic": 2, "certif": 2, "2018": 2, "85": 2, "2043": 2, "05": 2, "2044": 2, "februari": 2, "55": 2, "2045": 2, "900": 2, "700": 2, "60": 2, "250": 2, "2036": 2, "2046": 2, "450": 2, "2047": 2, "2049": 2, "2030": 2, "2050": 2, "2060": 2, "2028": 2, "2041": 2, "2051": 2, "2061": 2, "2032": 2, "2052": 2, "54": 2, "2033": 2, "2053": 2, "n9": 2, "ceo": 2, "n12": 2, "nsubsidiari": 2, "n23": 2, "nconsent": 2, "n24": 2, "npower": 2, "signatur": 2, "nrule": 2, "nsection": 2, "1350": 2, "n101": 2, "ninlin": 2, "xbrl": 2, "n104": 2, "inlin": 2, "compensatori": 2, "herewith": 2, "furnish": 2, "herebi": 2, "undertak": 2, "56": 2, "nsignatur": 2, "npursuant": 2, "duli": 2, "sign": 2, "undersign": 2, "thereunto": 2, "ndate": 2, "nby": 2, "luca": [2, 4], "maestri": 2, "nluca": 2, "nsenior": 2, "nchief": 2, "nknow": 2, "THESE": 2, "whose": 2, "constitut": 2, "appoint": 2, "timothi": 2, "cook": 2, "jointli": 2, "hi": [2, 4], "her": 2, "substitut": 2, "him": 2, "thereto": 2, "therewith": 2, "ratifi": 2, "said": 2, "done": [2, 4], "virtu": 2, "hereof": 2, "nname": 2, "ttitl": 2, "tdate": 2, "tchief": 2, "tnovemb": 2, "ntimothi": 2, "tsenior": 2, "chri": 2, "kondo": 2, "nchri": 2, "wanda": 2, "austin": 2, "nwanda": 2, "alex": 2, "gorski": 2, "tdirector": 2, "nalex": 2, "andrea": 2, "jung": 2, "nandrea": 2, "arthur": 2, "levinson": 2, "narthur": 2, "monica": 2, "lozano": 2, "nmonica": 2, "ronald": 2, "sugar": 2, "nronald": 2, "susan": 2, "l": 2, "wagner": 2, "nsusan": 2, "57": 2, "gpt": [2, 3, 4], "turbo": [2, 3, 4], "invdestacksmeticsisdict": 2, "setispect": 2, "20cyan": 2, "evaluationseld": 2, "anvis": 2, "droitent": 2, "discernminerv": 2, "versbobprefvers": 2, "vo\u8be5": 2, "option\u548c": 2, "meio": 2, "\u0432\u0440\u0435\u043ccisco": 2, "dellaischenpoihscap": 2, "geme": 2, "gettim": 2, "unscal": 2, "score": [2, 4], "vocabulari": 2, "closer": 2, "sharpen": 2, "uniform": 2, "raschka": 2, "simpl": [2, 3, 4], "dramat": [2, 4], "systemat": [2, 4], "At": 2, "rigid": 2, "wildli": 2, "radic": 2, "grappl": 2, "probabilist": 2, "seem": [2, 4], "safer": 2, "don": [2, 3, 4], "highlight": [2, 3, 4], "paradigm": 2, "anoth": 2, "fascin": 2, "spontan": 2, "answer": [2, 3, 4], "aren": 2, "explicitli": 2, "clear": [2, 4], "wei": 2, "fig": [2, 3, 4], "linear": 2, "absent": 2, "simpli": [2, 3, 4], "coax": 2, "onc": [2, 3], "reach": [2, 3, 4], "journei": 2, "suddenli": 2, "manifest": 2, "call": [2, 3, 4], "phase": 2, "stark": 2, "deliber": 2, "convent": 2, "stabl": 2, "suit": 2, "contend": 2, "7b": 2, "70b": 2, "rethink": 2, "math": 2, "tutor": 2, "children": 2, "verifi": [2, 4], "just": [2, 3, 4], "predefin": [2, 4], "adapt": [2, 3], "explan": [2, 4], "child": 2, "ag": 2, "bound": 2, "weren": 2, "accuraci": [2, 4], "kind": 2, "dimens": 2, "pre": 2, "explicit": [2, 4], "usual": 2, "precis": [2, 4], "resist": 2, "straightforward": [2, 3, 4], "quantif": 2, "contamin": 2, "carefulli": [2, 4], "craft": [2, 4], "massiv": 2, "alreadi": 2, "seen": 2, "memor": 2, "truli": 2, "unseen": 2, "rigor": 2, "evolut": 2, "longitudin": 2, "autom": [2, 4], "annot": 2, "mostli": [2, 4], "versu": 2, "latter": 2, "foundat": [2, 3], "tailor": 2, "solv": [2, 4], "great": [2, 4], "why": [2, 4], "misinform": 2, "factual": 2, "databas": [2, 4], "citat": 2, "tempor": 2, "scientif": 2, "fals": [2, 4], "manipul": 2, "medic": 2, "disclaim": 2, "referr": 2, "boundari": 2, "situat": [2, 3], "incorrect": 2, "expertis": 2, "bia": [2, 4], "gender": 2, "racial": 2, "demograph": 2, "stereotyp": 2, "reinforc": 2, "societ": 2, "pii": 2, "anonym": 2, "leakag": 2, "carryov": 2, "protocol": 2, "cognit": 2, "multi": [2, 4], "mathemat": 2, "fallaci": 2, "causal": 2, "edg": 2, "think": 2, "idiom": 2, "sarcasm": 2, "terminologi": 2, "lingual": 2, "misunderstand": 2, "syntax": 2, "scan": 2, "compat": [2, 4], "stabil": 2, "effici": [2, 3, 4], "scalabl": [2, 3], "meta": [2, 3], "overconfid": 2, "clariti": [2, 3, 4], "audienc": 2, "densiti": 2, "satisfact": [2, 4], "misus": 2, "moral": 2, "transpar": [2, 4], "co2": 2, "energi": 2, "consumpt": 2, "server": [2, 4], "batch": 2, "infer": 2, "imag": 2, "audio": 2, "etc": [2, 4], "truth": [2, 4], "layer": [2, 3, 4], "palm": 2, "shown": 2, "quantifi": 2, "rank": 2, "easi": [2, 3], "synthet": [2, 4], "post": [2, 4], "timeout": 2, "variat": 2, "maxim": 2, "inter": 2, "rater": 2, "priorit": 2, "ti": 2, "tier": 2, "holist": 2, "built": [2, 4], "mind": 2, "x": 2, "fast": 2, "experiment": [2, 4], "iter": [2, 3, 4], "vi": 2, "later": [2, 4], "categor": [2, 4], "intrins": 2, "extrins": 2, "sequenc": [2, 4], "perplex": 2, "downstream": [2, 4], "valuabl": [2, 4], "distinguish": 2, "classif": [2, 4], "true": [2, 3, 4], "synthesi": 2, "discret": 2, "f1": 2, "match": [2, 4], "prefix": 2, "roug": 2, "bleu": 2, "charact": [2, 3, 4], "gram": 2, "bilingu": 2, "understudi": 2, "overlap": [2, 3], "favor": [2, 4], "breviti": 2, "insensit": 2, "semant": [2, 3], "orient": 2, "gist": 2, "sentenc": [2, 3, 4], "ignor": 2, "meteor": 2, "synonym": 2, "stem": [2, 4], "paraphras": 2, "alongsid": 2, "computation": [2, 3], "cider": 2, "consensu": 2, "tf": 2, "idf": 2, "caption": 2, "reliant": 2, "corpu": 2, "statist": 2, "ter": 2, "edit": 2, "hypothesi": 2, "penal": 2, "bertscor": 2, "embed": [2, 3], "bert": 2, "spice": 2, "proposit": 2, "scene": 2, "emphasi": 2, "pure": 2, "analyst": [2, 3], "dictionari": [2, 4], "rouge_1": 2, "rouge_2": 2, "ideal": [2, 4], "expert": [2, 3, 4], "cheaper": 2, "4o": [2, 3, 4], "evaluate_summari": 2, "unigram": 2, "bigram": 2, "huggingfac": 2, "librari": [2, 3, 4], "absl": 2, "py": 2, "rouge_scor": 2, "generated_summari": 2, "reference_summari": 2, "arg": [2, 3, 4], "dict": [2, 3, 4], "google_bleu": 2, "bleu_scor": 2, "rouge1": 2, "rouge2": 2, "arbitrari": 2, "chosen": 2, "sentence1": 2, "cat": 2, "sat": 2, "mat": 2, "sentence2": 2, "ate": 2, "3333333333333333": 2, "7272727272727272": 2, "4444444444444445": 2, "generate_summari": 2, "summir": 2, "correspond": [2, 4], "liner": 2, "excerpt": 2, "evaluate_summary_model": 2, "model_benchmark": 2, "models_test": 2, "benchmark_summari": 2, "model_summari": 2, "evaluation_result": 2, "reveal": 2, "analyz": [2, 3, 4], "statu": 2, "concis": 2, "element": [2, 4], "Its": 2, "verbos": 2, "peripher": 2, "quit": [2, 4], "overli": [2, 4], "simplifi": [2, 4], "miss": 2, "convei": [2, 3], "breadth": 2, "Of": 2, "vibe": 2, "visualize_prompt_comparison": 2, "visual": 2, "matplotlib": 2, "radar": 2, "plot": 2, "radar_plot": 2, "tmp": 2, "ipykernel_1652501": 2, "940173201": 2, "userwarn": 2, "figurecanvasagg": 2, "closest": 2, "largest": 2, "deviat": [2, 4], "suggest": [2, 4], "mention": [2, 4], "nuanc": [2, 3, 4], "granular": [2, 3], "fall": 2, "judg": 2, "themselv": 2, "main": [2, 3, 4], "instruct": [2, 3, 4], "tune": [2, 4], "assign": 2, "likert": 2, "style": 2, "pairwis": 2, "ensembl": 2, "repeatedli": 2, "domain": 2, "fluenci": 2, "refin": 2, "excel": [2, 4], "narr": 2, "mirror": 2, "similarli": 2, "notabl": [2, 4], "properli": [2, 4], "henc": 2, "worth": 2, "integ": 2, "rubric": 2, "hollist": 2, "judgeevalu": 2, "grammar": [2, 4], "evaluate_with_llm": 2, "candid": 2, "pars": [2, 4], "criterion": 2, "basemodel": [2, 4], "judge_model": 2, "candidate_summari": 2, "written": 2, "grammat": 2, "y": 2, "z": 2, "w": [2, 3], "beta": [2, 4], "response_format": [2, 4], "Then": 2, "benchmark_model": 2, "test_model": 2, "input_text": [2, 3], "tupl": 2, "trillion": [2, 4], "evals_list": 2, "1775618912": 2, "variant": 2, "slightli": 2, "drift": 2, "lowest": 2, "drop": 2, "gradient": 2, "visibl": 2, "degrad": [2, 4], "firstli": 2, "overhead": 2, "neglect": 2, "prefer": [2, 4], "egocentr": 2, "tight": 2, "field": [2, 4], "aproach": 2, "workflow": [2, 4], "assessor": 2, "aplic": 2, "aim": [2, 3, 4], "clearli": [2, 4], "earlier": 2, "depict": [2, 4], "correl": 2, "multilingu": 2, "golden": 2, "languang": 2, "arena": 2, "blind": 2, "randomli": 2, "pair": 2, "loop": 2, "customiz": 2, "irrelev": 2, "unhelp": 2, "though": [2, 4], "occasion": 2, "rare": 2, "inaccuraci": 2, "perfectli": 2, "cater": 2, "critiqu": 2, "elo": 2, "democrat": [2, 4], "thought": [2, 4], "exam": 2, "probe": 2, "certifi": 2, "histori": 2, "move": [2, 3], "began": 2, "glue": 2, "wang": 2, "entail": 2, "baselin": 2, "superglu": 2, "deeper": [2, 3], "successor": 2, "grew": 2, "big": 2, "bench": 2, "srivastava": 2, "arithmet": 2, "truthfulqa": 2, "lin": [2, 4], "decept": 2, "multitask": 2, "hendryck": 2, "multidisciplinari": 2, "stanford": 2, "helm": 2, "liang": 2, "multidimension": 2, "surround": [2, 4], "emphas": [2, 4], "humanev": 2, "chen": [2, 4], "lmsy": 2, "brought": 2, "dialogu": 2, "len": [2, 3], "replic": [2, 4], "chatbot": 2, "chiang": 2, "gather": 2, "alpacaev": 2, "duboi": 2, "mt": 2, "zheng": 2, "Their": [2, 4], "render": 2, "crowdsourc": 2, "livebench": 2, "white": 2, "resili": 2, "meaningfulli": 2, "monthli": 2, "came": 2, "arc": 2, "prize": 2, "chollet": 2, "mike": 2, "knoop": 2, "founder": 2, "zapier": 2, "fran\u00e7oi": 2, "creator": 2, "agi": 2, "kera": 2, "meaning": [2, 3, 4], "genuin": 2, "old": 2, "possess": 2, "count": [2, 3], "elementari": 2, "novelti": 2, "puzzl": 2, "someth": 2, "wouldn": 2, "interpol": 2, "memori": [2, 3], "synthes": 2, "fly": 2, "brute": 2, "minim": [2, 4], "pixel": 2, "perfect": 2, "color": 2, "unbeaten": 2, "win": 2, "deep": 2, "poorli": 2, "recombin": 2, "spur": 2, "art": 2, "takeawai": 2, "algorithm": 2, "fourrier": 2, "lightweight": [2, 4], "bespok": 2, "sdk": 2, "cli": 2, "extract": [2, 3, 4], "autoregress": 2, "sub": 2, "liter": 2, "disturb": 2, "zero": 2, "varianc": 2, "yt": 2, "ut": 2, "suppos": 2, "exactli": [2, 4], "ol": 2, "heteroscedast": 2, "regress": 2, "wish": 2, "lag": 2, "bivari": 2, "evaluation_track": 2, "evaluationtrack": 2, "model_config": 2, "basemodelconfig": 2, "parallelismmanag": 2, "pipelineparamet": 2, "envconfig": 2, "is_accelerate_avail": 2, "datetim": 2, "timedelta": 2, "initprocessgroupkwarg": 2, "create_evaluation_pipelin": 2, "output_dir": 2, "cache_dir": 2, "pretrain": 2, "dtype": 2, "float16": 2, "max_sampl": 2, "kwargs_handl": 2, "3000": 2, "els": [2, 3], "save_detail": 2, "push_to_hub": 2, "pipeline_param": 2, "launcher_typ": 2, "env_config": 2, "override_batch_s": 2, "use_chat_templ": 2, "trust_remote_cod": 2, "pipeline_paramet": 2, "schemat": [2, 3], "vllm": [2, 4], "tgi": 2, "instanti": 2, "storag": 2, "push": 2, "hub": 2, "parallel": 2, "num_few_shot": 2, "automat": 2, "string": [2, 4], "vertic": 2, "bar": 2, "binari": 2, "flag": 2, "bigbench": 2, "winogrand": 2, "hellaswag": 2, "nlp": 2, "save_and_push_result": 2, "show_result": 2, "model_arg": 2, "remot": 2, "send": [2, 4], "serverless": 2, "inference_server_address": 2, "inference_server_auth": 2, "model_id": 2, "null": 2, "bash": 2, "command": 2, "model_config_path": 2, "path": [2, 3], "endpoint_model": 2, "yaml": [2, 4], "llama3": [2, 3], "qwen2": [2, 4], "smollm2": 2, "3b": 2, "alibaba": [2, 4], "5b": [2, 4], "hui": 2, "yang": 2, "compact": 2, "360m": 2, "allal": 2, "cluster": 2, "noteworthi": 2, "superior": 2, "grain": [2, 4], "salt": [2, 4], "give": 2, "exponenti": 2, "hug": [2, 4], "modular": 2, "visit": 2, "offici": 2, "revisit": 2, "rememb": 2, "api_kei": [2, 3], "trace": 2, "langchain_tracing_v2": 2, "langchain_api_kei": 2, "hf_evalu": 2, "langsmith_evalu": 2, "ls_client": 2, "tobia": 2, "src": 2, "lib": 2, "python3": 2, "tqdm": 2, "auto": 2, "tqdmwarn": 2, "iprogress": 2, "pleas": 2, "jupyt": 2, "ipywidget": 2, "readthedoc": 2, "en": [2, 4], "user_instal": 2, "html": [2, 3, 4], "autonotebook": 2, "notebook_tqdm": 2, "dataset_nam": 2, "create_dataset": 2, "create_exampl": 2, "dataset_id": 2, "calculate_scor": 2, "reference_output": 2, "oai_client": 2, "xp_model_nam": 2, "lastli": 2, "run_evalu": 2, "upload": 2, "And": 2, "upload_result": 2, "experiment_prefix": 2, "num_repetit": 2, "view": 2, "386a3620": 2, "smith": 2, "9e1cc3cb": 2, "9d6a": 2, "4356": 2, "ab34": 2, "138e0abe8be4": 2, "8741976e": 2, "5268": 2, "4b75": 2, "949f": 2, "99477dde5d64": 2, "selectedsess": 2, "b831dc1e": 2, "90bc": 2, "4ed8": 2, "8080": 2, "fb42444724d6": 2, "4it": 2, "latest": [2, 3, 4], "modul": [2, 4], "evaluate_modul": 2, "6fc70b7be0088120a372dfdd5d320b39b8bb3630cb8029b193941d9376e86bb0": 2, "tue": 2, "nov": 2, "couldn": 2, "5it": 2, "5053784e": 2, "64445871": 2, "a53c": 2, "44b1": 2, "a422": 2, "4f49b2f9656f": 2, "69": 2, "4b29f3c9": 2, "9ef7e39a": 2, "2add": 2, "410c": 2, "89f8": 2, "9f1a8b198cf1": 2, "61": 2, "df": 2, "to_panda": 2, "insert": 2, "combined_df": 2, "concat": 2, "ignore_index": 2, "execution_tim": 2, "example_id": 2, "333333": 2, "224388": 2, "feb10f92": 2, "3167": 2, "41f3": 2, "bb1c": 2, "d271153a31a8": 2, "5b196b22": 2, "9f4c": 2, "489c": 2, "b020": 2, "7823208b42d6": 2, "348101": 2, "722464": 2, "c310f159": 2, "064a": 2, "4035": 2, "97c3": 2, "a25bbf43abc2": 2, "386076": 2, "704104": 2, "f7f24899": 2, "dd50": 2, "409e": 2, "93cc": 2, "6fb1622b60bf": 2, "443038": 2, "725059": 2, "242856d6": 2, "efb5": 2, "4101": 2, "b1cf": 2, "5805532838ac": 2, "373418": 2, "795302": 2, "ce975169": 2, "a0ab": 2, "40ce": 2, "8e32": 2, "efa28d06079d": 2, "stat": 2, "groupbi": 2, "agg": 2, "std": 2, "round": 2, "sort": 2, "sort_valu": 2, "figur": [2, 4], "subplot": 2, "side": 2, "pyplot": 2, "plt": 2, "numpi": 2, "np": 2, "ax1": 2, "ax2": 2, "figsiz": 2, "2ecc71": 2, "3498db": 2, "e74c3c": 2, "bleu_mean": 2, "bleu_std": 2, "enumer": [2, 3], "errorbar": 2, "yerr": 2, "fmt": 2, "markers": 2, "capsiz": 2, "label": [2, 4], "alpha": 2, "set_ylabel": 2, "set_titl": 2, "set_xtick": 2, "set_xticklabel": 2, "rotat": 2, "set_ylim": 2, "bottom": 2, "axi": 2, "legend": 2, "grid": 2, "exec_mean": 2, "exec_std": 2, "tight_layout": 2, "ndetail": 2, "4038": 2, "0453": 2, "7815": 2, "0433": 2, "3768": 2, "0424": 2, "8343": 2, "2208": 2, "3519": 2, "0775": 2, "9122": 2, "1482": 2, "377": 2, "042": 2, "83": 2, "078": 2, "slower": 2, "fastest": 2, "04": [2, 3], "latenc": [2, 3], "speed": 2, "interestingli": 2, "longer": 2, "alb": 2, "loubna": 2, "ben": 2, "anton": 2, "lozhkov": 2, "eli": 2, "bakouch": 2, "gabriel": 2, "mart\u00edn": 2, "bl\u00e1zquez": 2, "lewi": 2, "tunstal": 2, "agust\u00edn": 2, "piquer": 2, "andr": 2, "marafioti": 2, "cyril": 2, "zakka": 2, "leandro": 2, "von": 2, "werra": 2, "thoma": 2, "wolf": 2, "are24": 2, "judgearena": 2, "ctj": 2, "jerri": 2, "tworek": 2, "heewoo": 2, "jun": 2, "qime": 2, "yuan": 2, "henriqu": 2, "pond": 2, "de": 2, "oliveira": 2, "pinto": 2, "jare": 2, "kaplan": 2, "harri": 2, "edward": 2, "yuri": 2, "burda": 2, "nichola": 2, "joseph": 2, "greg": 2, "brockman": 2, "rai": 2, "raul": 2, "puri": 2, "gretchen": 2, "krueger": 2, "michael": [2, 4], "petrov": 2, "heidi": 2, "khlaaf": 2, "girish": 2, "sastri": 2, "pamela": 2, "mishkin": 2, "brook": 2, "chan": 2, "scott": 2, "grai": 2, "nick": 2, "ryder": 2, "mikhail": 2, "pavlov": 2, "alethea": 2, "lukasz": 2, "kaiser": 2, "mohammad": 2, "bavarian": 2, "clemen": 2, "winter": 2, "philipp": 2, "tillet": 2, "felip": 2, "petroski": 2, "dave": 2, "cum": 2, "matthia": 2, "plappert": 2, "fotio": 2, "chantzi": 2, "elizabeth": 2, "barn": 2, "ariel": 2, "herbert": 2, "voss": 2, "hebgen": 2, "guss": 2, "nichol": 2, "paino": 2, "nikola": 2, "tezak": 2, "jie": 2, "tang": 2, "igor": 2, "babuschkin": 2, "suchir": 2, "balaji": 2, "shantanu": 2, "jain": 2, "saunder": 2, "christoph": 2, "hess": 2, "andrew": 2, "carr": 2, "jan": 2, "leik": 2, "josh": 2, "achiam": 2, "vedant": 2, "misra": 2, "evan": 2, "morikawa": 2, "alec": 2, "radford": 2, "matthew": 2, "knight": 2, "mile": 2, "brundag": 2, "mira": 2, "murati": 2, "kati": 2, "mayer": 2, "peter": 2, "welind": 2, "bob": [2, 4], "mcgrew": 2, "dario": 2, "amodei": 2, "sam": 2, "mccandlish": 2, "ilya": 2, "sutskev": 2, "wojciech": 2, "zaremba": 2, "arxiv": [2, 4], "org": [2, 4], "ab": [2, 4], "2107": 2, "03374": 2, "cz": 2, "lianmin": 2, "ying": 2, "sheng": 2, "anastasio": 2, "angelopoulo": 2, "tianl": 2, "dacheng": 2, "hao": 2, "zhang": 2, "banghua": 2, "zhu": 2, "jordan": 2, "gonzalez": 2, "ion": 2, "stoica": 2, "2403": 2, "04132": 2, "cho24a": 2, "francoi": 2, "arcpriz": 2, "cho24b": 2, "dglh24": 2, "yann": 2, "bal\u00e1z": 2, "galambosi": 2, "perci": 2, "tatsunori": 2, "hashimoto": 2, "debia": 2, "2404": 2, "04475": 2, "fac24a": 2, "wiki": [2, 4], "fac24b": 2, "fac24c": 2, "doc": [2, 3, 4], "model_doc": 2, "gpt2": 2, "fac24d": 2, "cookbook": 2, "llm_judg": 2, "fac24": 2, "fac24f": 2, "blog": 2, "fhwt23": 2, "cl\u00e9mentin": 2, "nathan": 2, "habib": 2, "hbb": 2, "dan": 2, "collin": 2, "burn": 2, "steven": 2, "basart": 2, "andi": 2, "zou": 2, "manta": 2, "mazeika": 2, "dawn": 2, "song": 2, "jacob": 2, "steinhardt": 2, "03300": 2, "hbd": 2, "ari": 2, "du": 2, "maxwel": 2, "forb": 2, "yejin": 2, "choi": 2, "curiou": 2, "neural": [2, 4], "degener": 2, "1904": 2, "09751": 2, "hyc": 2, "binyuan": 2, "jian": 2, "zeyu": 2, "cui": 2, "jiaxi": 2, "dayiheng": 2, "liu": [2, 4], "lei": 2, "tianyu": 2, "jiajun": 2, "bowen": 2, "yu": 2, "kai": 2, "dang": 2, "coder": 2, "preprint": [2, 4], "2409": 2, "12186": 2, "lx": 2, "zhen": 2, "xiaohan": 2, "xu": 2, "tao": 2, "shen": 2, "jia": 2, "gu": 2, "yuxuan": 2, "lai": 2, "chongyang": 2, "shuai": 2, "ma": 2, "nlg": 2, "2401": 2, "07103": 2, "lbl": 2, "rishi": 2, "bommasani": 2, "toni": 2, "lee": [2, 4], "dimitri": 2, "tsipra": 2, "dilara": 2, "soylu": 2, "michihiro": 2, "yasunaga": 2, "yian": 2, "deepak": 2, "narayanan": 2, "yuhuai": 2, "wu": [2, 4], "ananya": 2, "kumar": 2, "benjamin": 2, "newman": 2, "binhang": 2, "bobbi": 2, "yan": 2, "ce": 2, "christian": 2, "cosgrov": 2, "r\u00e9": 2, "diana": 2, "acosta": 2, "nava": 2, "drew": 2, "hudson": 2, "eric": 2, "zelikman": 2, "esin": 2, "durmu": 2, "faisal": 2, "ladhak": 2, "frieda": 2, "rong": 2, "hongyu": 2, "ren": 2, "huaxiu": 2, "yao": 2, "jue": 2, "keshav": 2, "santhanam": 2, "laurel": 2, "orr": 2, "lucia": 2, "mert": 2, "yuksekgonul": 2, "mirac": 2, "suzgun": 2, "kim": 2, "neel": 2, "guha": 2, "niladri": 2, "chatterji": 2, "omar": 2, "khattab": 2, "henderson": 2, "qian": 2, "huang": 2, "ryan": 2, "chi": [2, 4], "sang": 2, "xie": 2, "shibani": 2, "santurkar": 2, "surya": 2, "ganguli": 2, "icard": 2, "tianyi": 2, "vishrav": 2, "chaudhari": 2, "xuechen": 2, "yifan": 2, "yuhui": 2, "yuta": 2, "koreeda": 2, "2211": 2, "09110": 2, "lhe22": 2, "stephani": 2, "hilton": 2, "owain": 2, "mimic": 2, "falsehood": 2, "2109": 2, "07958": 2, "ras24": 2, "sebastian": 2, "scratch": 2, "isbn": 2, "1633437166": 2, "srr": 2, "aarohi": 2, "abhinav": 2, "rastogi": 2, "abhishek": 2, "rao": 2, "abu": 2, "awal": 2, "md": [2, 4], "shoeb": 2, "abubakar": 2, "abid": 2, "adam": 2, "fisch": 2, "brown": 2, "santoro": 2, "aditya": 2, "gupta": 2, "adri\u00e0": 2, "garriga": 2, "alonso": 2, "agnieszka": 2, "kluska": 2, "aitor": 2, "lewkowycz": 2, "akshat": 2, "agarw": 2, "warstadt": 2, "alexand": [2, 4], "kocurek": 2, "ali": 2, "safaya": 2, "tazarv": 2, "alic": [2, 4], "xiang": 2, "alicia": 2, "parrish": 2, "allen": 2, "nie": 2, "aman": 2, "hussain": 2, "amanda": 2, "askel": 2, "dsouza": 2, "ambros": 2, "slone": 2, "ameet": 2, "rahan": 2, "anantharaman": 2, "iyer": 2, "ander": 2, "andreassen": 2, "madotto": 2, "santilli": 2, "stuhlm\u00fcller": 2, "la": 2, "lampinen": 2, "angela": 2, "jiang": 2, "angelica": 2, "anh": 2, "vuong": 2, "animesh": 2, "anna": 2, "gottardi": 2, "antonio": 2, "norelli": 2, "anu": 2, "venkatesh": 2, "arash": 2, "gholamidavoodi": 2, "arfa": 2, "tabassum": 2, "arul": 2, "menez": 2, "arun": 2, "kirubarajan": 2, "asher": 2, "mullokandov": 2, "ashish": 2, "sabharw": 2, "herrick": 2, "avia": 2, "efrat": 2, "aykut": 2, "erdem": 2, "ayla": 2, "karaka\u015f": 2, "robert": 2, "bao": 2, "loe": 2, "barret": 2, "zoph": 2, "bart\u0142omiej": 2, "bojanowski": 2, "batuhan": 2, "\u00f6zyurt": 2, "behnam": 2, "hedayatnia": 2, "neyshabur": 2, "inden": 2, "benno": 2, "stein": 2, "berk": 2, "ekmekci": 2, "yuchen": 2, "blake": 2, "howald": 2, "bryan": 2, "orinion": 2, "cameron": [2, 4], "diao": 2, "dour": 2, "catherin": 2, "stinson": 2, "cedrick": 2, "argueta": 2, "c\u00e9sar": 2, "ferri": 2, "ram\u00edrez": 2, "chandan": 2, "singh": 2, "charl": 2, "rathkopf": 2, "chenlin": 2, "meng": 2, "chitta": 2, "baral": 2, "chiyu": 2, "callison": 2, "burch": 2, "wait": 2, "voigt": 2, "pott": 2, "cindi": 2, "ramirez": 2, "clara": 2, "rivera": 2, "clemencia": 2, "siro": 2, "colin": 2, "raffel": 2, "courtnei": 2, "ashcraft": 2, "cristina": 2, "garbacea": 2, "damien": 2, "sileo": 2, "garrett": 2, "kilman": 2, "roth": 2, "daniel": 2, "freeman": 2, "khashabi": 2, "levi": 2, "mosegu\u00ed": 2, "gonz\u00e1lez": 2, "perszyk": 2, "danni": 2, "hernandez": 2, "danqi": 2, "daphn": 2, "ippolito": 2, "dar": 2, "gilboa": 2, "david": 2, "dohan": 2, "drakard": 2, "jurgen": 2, "debajyoti": 2, "datta": 2, "deni": 2, "emelin": 2, "kleyko": 2, "deniz": 2, "yuret": 2, "derek": 2, "tam": [2, 4], "dieuwk": 2, "hupk": 2, "diganta": 2, "dilyar": 2, "buzan": 2, "coelho": 2, "mollo": 2, "diyi": 2, "dong": 2, "ho": 2, "dylan": 2, "schrader": 2, "ekaterina": 2, "shutova": 2, "ekin": 2, "dogu": 2, "cubuk": 2, "elad": 2, "segal": 2, "eleanor": 2, "hagerman": 2, "donowai": 2, "elli": 2, "pavlick": 2, "emanuel": 2, "rodola": 2, "emma": 2, "lam": 2, "chu": 2, "erkut": 2, "erni": 2, "ethan": 2, "dyer": 2, "jerzak": 2, "eunic": 2, "engefu": 2, "manyasi": 2, "evgenii": 2, "zheltonozhskii": 2, "fanyu": 2, "xia": 2, "fatemeh": 2, "siar": 2, "fernando": 2, "mart\u00ednez": 2, "plume": 2, "francesca": 2, "happ\u00e9": 2, "gaurav": 2, "mishra": 2, "genta": 2, "indra": 2, "winata": 2, "gerard": 2, "melo": 2, "germ\u00e1n": 2, "kruszewski": 2, "giambattista": 2, "parascandolo": 2, "giorgio": 2, "mariani": 2, "gloria": 2, "gonzalo": 2, "jaimovitch": 2, "l\u00f3pez": 2, "gregor": 2, "betz": 2, "gui": 2, "gur": 2, "hana": 2, "galijasev": 2, "hannah": 2, "rashkin": 2, "hannaneh": 2, "hajishirzi": 2, "harsh": 2, "mehta": 2, "hayden": 2, "bogar": 2, "henri": 2, "shevlin": 2, "hinrich": 2, "sch\u00fctze": 2, "hiromu": 2, "yakura": 2, "hongm": 2, "hugh": 2, "mee": 2, "wong": 2, "ian": 2, "ng": 2, "isaac": 2, "nobl": 2, "jaap": 2, "jumelet": 2, "jack": 2, "geissing": 2, "jackson": 2, "kernion": 2, "jaehoon": 2, "jaim": 2, "fern\u00e1ndez": 2, "fisac": 2, "jame": 2, "simon": 2, "koppel": 2, "koco\u0144": 2, "jana": 2, "thompson": 2, "janel": 2, "wingfield": 2, "jarema": 2, "radom": 2, "jascha": 2, "sohl": 2, "dickstein": 2, "jason": 2, "phang": 2, "yosinski": 2, "jekaterina": 2, "novikova": 2, "jell": 2, "bosscher": 2, "jennif": 2, "marsh": 2, "jeremi": 2, "jeroen": 2, "taal": 2, "jess": 2, "engel": 2, "jesujoba": 2, "alabi": 2, "jiacheng": 2, "jiam": 2, "jillian": 2, "joan": 2, "waweru": 2, "john": 2, "burden": 2, "miller": 2, "bali": 2, "jonathan": 2, "batcheld": 2, "berant": 2, "j\u00f6rg": 2, "frohberg": 2, "jo": 2, "rozen": 2, "orallo": 2, "boudeman": 2, "guerr": 2, "joshua": 2, "tenenbaum": 2, "joyc": 2, "chua": 2, "kamil": 2, "kanclerz": 2, "karen": 2, "livescu": 2, "karl": 2, "krauth": 2, "karthik": 2, "gopalakrishnan": 2, "katerina": 2, "ignatyeva": 2, "katja": 2, "markert": 2, "kaustubh": 2, "dhole": 2, "kevin": 2, "gimpel": 2, "omondi": 2, "kori": 2, "mathewson": 2, "kristen": 2, "chiafullo": 2, "ksenia": 2, "shkaruta": 2, "shridhar": 2, "kyle": 2, "mcdonel": 2, "richardson": 2, "laria": 2, "reynold": 2, "leo": 2, "gao": 2, "liam": 2, "dugan": 2, "lianhui": 2, "qin": 2, "lidia": 2, "contrera": 2, "ochando": 2, "loui": 2, "morenc": 2, "moschella": 2, "luci": 2, "ludwig": 2, "schmidt": 2, "luheng": 2, "lui": 2, "olivero": 2, "col\u00f3n": 2, "luke": 2, "metz": 2, "l\u00fctfi": 2, "kerem": 2, "\u015fenel": 2, "maarten": 2, "bosma": 2, "sap": 2, "maartj": 2, "hoev": 2, "maheen": 2, "farooqi": 2, "manaal": 2, "faruqui": 2, "marco": 2, "baturan": 2, "marelli": 2, "maru": 2, "maria": 2, "quintana": 2, "mari": 2, "tolkiehn": 2, "mario": 2, "giulianelli": 2, "martha": 2, "martin": 2, "potthast": 2, "leavitt": 2, "hagen": 2, "m\u00e1ty\u00e1": 2, "schubert": 2, "medina": 2, "orduna": 2, "baitemirova": 2, "melodi": 2, "arnaud": 2, "melvin": 2, "mcelrath": 2, "yee": 2, "cohen": 2, "ivanitskii": 2, "starritt": 2, "strube": 2, "micha\u0142": 2, "sw\u0119drowski": 2, "michel": 2, "bevilacqua": 2, "mihir": 2, "kale": 2, "cain": 2, "mime": 2, "mitch": 2, "walker": 2, "mo": 2, "tiwari": 2, "mohit": 2, "bansal": 2, "moin": 2, "aminnaseri": 2, "mor": 2, "geva": 2, "mozhdeh": 2, "gheini": 2, "mukund": 2, "varma": 2, "nanyun": 2, "peng": 2, "nayeon": 2, "neta": 2, "krakov": 2, "doiron": 2, "nicol": 2, "martinez": 2, "nikita": 2, "nangia": 2, "nikla": 2, "decker": 2, "muennighoff": 2, "nitish": 2, "shirish": 2, "keskar": 2, "niveditha": 2, "noah": 2, "constant": 2, "fiedel": 2, "nuan": 2, "wen": 2, "oliv": 2, "agha": 2, "elbaghdadi": 2, "omer": 2, "moreno": 2, "casar": 2, "parth": 2, "doshi": 2, "pascal": 2, "fung": 2, "paul": 2, "pu": 2, "vicol": 2, "pegah": 2, "alipoormolabashi": 2, "peiyuan": 2, "liao": 2, "eckerslei": 2, "phu": 2, "mon": 2, "htut": 2, "pinyu": 2, "hwang": 2, "piotr": 2, "mi\u0142kowski": 2, "piyush": 2, "patil": 2, "pouya": 2, "pezeshkpour": 2, "priti": 2, "oli": 2, "qiaozhu": 2, "mei": 2, "qing": 2, "lyu": 2, "qinlang": 2, "rabin": 2, "banjad": 2, "rachel": 2, "etta": 2, "rudolph": 2, "raefer": 2, "rahel": 2, "haback": 2, "ramon": 2, "risco": 2, "rapha\u00ebl": 2, "milli\u00e8r": 2, "rhythm": 2, "garg": 2, "rif": 2, "saurou": 2, "riku": 2, "arakawa": 2, "robb": 2, "raymaek": 2, "frank": 2, "rohan": 2, "sikand": 2, "roman": 2, "novak": 2, "sitelew": 2, "ronan": 2, "lebra": 2, "rosann": 2, "rowan": 2, "rui": [2, 4], "ruslan": 2, "salakhutdinov": 2, "stoval": 2, "teehan": 2, "rylan": 2, "sahib": 2, "saif": 2, "sajant": 2, "anand": 2, "dillav": 2, "shleifer": 2, "wiseman": 2, "samuel": 2, "gruetter": 2, "bowman": 2, "schoenholz": 2, "sanghyun": 2, "han": 2, "sanjeev": 2, "kwatra": 2, "sarah": 2, "sarik": 2, "ghazarian": 2, "sayan": 2, "ghosh": 2, "sean": 2, "casei": 2, "bischoff": 2, "gehrmann": 2, "schuster": 2, "sepideh": 2, "sadeghi": 2, "shadi": 2, "hamdan": 2, "sharon": 2, "zhou": 2, "shashank": 2, "sherri": 2, "shi": 2, "shikhar": 2, "shima": 2, "asaadi": 2, "shixiang": 2, "shane": 2, "shubh": 2, "pachchigar": 2, "shubham": 2, "toshniw": 2, "shyam": 2, "upadhyai": 2, "shyamolima": 2, "debnath": 2, "siamak": 2, "shakeri": 2, "thormey": 2, "melzi": 2, "siva": 2, "reddi": 2, "sneha": 2, "priscilla": 2, "makini": 2, "soo": 2, "hwan": 2, "spencer": 2, "toren": 2, "sriharsha": 2, "hatwar": 2, "stanisla": 2, "dehaen": 2, "stefan": 2, "divic": 2, "stefano": 2, "ermon": 2, "stella": 2, "biderman": 2, "stephen": 2, "prasad": 2, "piantadosi": 2, "stuart": 2, "shieber": 2, "summer": 2, "misherghi": 2, "svetlana": 2, "kiritchenko": 2, "swaroop": 2, "tal": 2, "linzen": 2, "tariq": 2, "tatsu": 2, "te": 2, "th\u00e9o": 2, "desbord": 2, "theodor": 2, "rothschild": 2, "phan": 2, "tiberiu": 2, "nkinyili": 2, "timo": 2, "schick": 2, "timofei": 2, "kornev": 2, "titu": 2, "tunduni": 2, "gerstenberg": 2, "trenton": 2, "trishala": 2, "neeraj": 2, "tushar": 2, "khot": 2, "tyler": 2, "shultz": 2, "uri": 2, "shaham": 2, "vera": 2, "demberg": 2, "victoria": 2, "nyamai": 2, "vika": 2, "raunak": 2, "vinai": 2, "ramasesh": 2, "udai": 2, "prabhu": 2, "vishakh": 2, "padmakumar": 2, "vivek": 2, "srikumar": 2, "fedu": 2, "wout": 2, "vossen": 2, "xiaoyu": 2, "tong": 2, "xinran": 2, "zhao": 2, "xinyi": 2, "xudong": 2, "yadollah": 2, "yaghoobzadeh": 2, "yair": 2, "lakretz": 2, "yangqiu": 2, "yasaman": 2, "bahri": 2, "yichi": 2, "yide": 2, "yifu": 2, "yonatan": 2, "belinkov": 2, "hou": 2, "yufang": 2, "yuntao": 2, "bai": 2, "zachari": 2, "seid": 2, "zhuoy": 2, "zijian": 2, "ziji": 2, "j": [2, 4], "zirui": 2, "ziyi": 2, "extrapol": 2, "2206": 2, "04615": 2, "wpn": 2, "yada": 2, "pruksachatkun": 2, "amanpreet": 2, "julian": 2, "felix": 2, "hill": 2, "stickier": 2, "wsm": 2, "1804": 2, "07461": 2, "wtb": 2, "yi": [2, 4], "tai": 2, "borgeaud": 2, "dani": 2, "yogatama": 2, "denni": 2, "donald": 2, "metzler": 2, "ed": 2, "h": 2, "oriol": 2, "vinyal": 2, "dean": 2, "07682": 2, "wdr": 2, "doolei": 2, "manlei": 2, "arka": 2, "pal": 2, "feuer": 2, "siddhartha": 2, "ravid": 2, "shwartz": 2, "ziv": 2, "khalid": 2, "saifullah": 2, "siddartha": 2, "naidu": 2, "chinmai": 2, "hegd": 2, "lecun": 2, "tom": 2, "goldstein": 2, "willi": 2, "neiswang": 2, "micah": 2, "goldblum": 2, "2406": 2, "19314": 2, "yyh": 2, "baosong": 2, "bo": 2, "chengpeng": 2, "chengyuan": 2, "fei": 2, "guant": 2, "haoran": 2, "huan": 2, "jialong": 2, "jialin": 2, "jianhong": 2, "tu": 2, "jianwei": 2, "jianxin": 2, "jin": 2, "jingren": 2, "jinz": 2, "jinzheng": 2, "junyang": 2, "keme": 2, "lu": 2, "keqin": 2, "kexin": 2, "mingfeng": 2, "xue": 2, "ni": 2, "pei": 2, "ru": 2, "men": 2, "ruiz": 2, "runji": 2, "shiji": 2, "sinan": 2, "tan": 2, "tianhang": 2, "tianhao": 2, "wenbin": 2, "ge": 2, "xiaodong": 2, "deng": 2, "xiaohuan": 2, "xingzhang": 2, "xinyu": 2, "xipin": 2, "xuancheng": 2, "fan": 2, "yichang": 2, "wan": 2, "yunfei": 2, "yuqiong": 2, "zhenru": 2, "zhihao": 2, "2407": 2, "10671": 2, "zc": 2, "siyuan": 2, "zhuang": 2, "zhanghao": 2, "yonghao": 2, "zi": 2, "zhuohan": 2, "xing": 2, "2306": 2, "05685": 2, "huggingface24": 2, "06": [2, 4], "metaai24": 2, "promptfoo24": 2, "toolkit": 2, "dev": 2, "far": 3, "possibli": 3, "eliot": 3, "english": 3, "thumb": 3, "\u00be": 3, "max_output_token": 3, "4096": 3, "16384": 3, "contrari": 3, "surpass": 3, "truncat": 3, "max_input_token": 3, "input_cost_per_token": 3, "output_cost_per_token": 3, "11b": 3, "v1": 3, "128000": 3, "5e": 3, "sonnet": 3, "20241022": 3, "8192": 3, "200000": 3, "3e": 3, "0613": 3, "6e": 3, "1e": 3, "gemini": 3, "flash": 3, "1048576": 3, "2097152": 3, "05e": 3, "incomplet": 3, "abruptli": 3, "shallow": 3, "thorough": 3, "dissatisfact": 3, "frustrat": 3, "creation": 3, "feasibl": 3, "split": 3, "10k": 3, "diagram": 3, "charactertextsplitt": 3, "tiktoken": 3, "sequenti": 3, "newlin": 3, "broadli": [3, 4], "want": 3, "sure": [3, 4], "cheap": 3, "speciali": 3, "naiv": 3, "nltk": 3, "spaci": 3, "recurs": 3, "divid": 3, "hierarch": 3, "talk": 3, "theme": 3, "splitter": 3, "markdown": 3, "get_chunk": 3, "chunk_siz": 3, "chunk_overlap": 3, "langchain_text_splitt": 3, "text_splitt": 3, "from_tiktoken_encod": 3, "split_text": 3, "persona": 3, "task": [3, 4], "langchain_cor": [3, 4], "prompttempl": 3, "get_base_prompt_templ": 3, "base_prompt": [3, 4], "from_templ": 3, "llmchain": 3, "togeth": 3, "parser": [3, 4], "output_pars": 3, "stroutputpars": 3, "langchain_commun": 3, "chat_model": 3, "chatlitellm": 3, "get_llm_chain": 3, "prompt_templ": [3, 4], "llm_chain": [3, 4], "api_key_label": 3, "upper": 3, "_api_kei": 3, "get_dynamic_prompt_templ": 3, "get_dynamic_prompt_param": 3, "prompt_param": 3, "part_idx": 3, "total_part": 3, "chat_context": 3, "param": 3, "dynamic_prompt_param": 3, "elif": 3, "merg": 3, "concaten": 3, "generate_report": 3, "input_cont": 3, "llm_model_nam": 3, "report_part": 3, "num_part": 3, "dinam": 3, "priovid": 3, "invok": [3, 4], "cummul": 3, "join": 3, "max_chunk_s": 3, "max_chunk_overlap": 3, "readabl": 3, "apple_report": 3, "luation": 3, "disciplin": 3, "smooth": 3, "subhead": 3, "despit": [3, 4], "depth": 3, "overlook": 3, "preserv": 3, "easier": [3, 4], "preprocess": 3, "necessit": 3, "meticul": 3, "bottleneck": 3, "friendli": 3, "mustafa": 3, "suleyman": 3, "infinit": 3, "fewer": 3, "progress": 3, "condens": 3, "versatil": 3, "drive": [3, 4], "grace": 3, "fallback": 3, "empow": 3, "crucial": [3, 4], "langchain24": 3, "how_to": 3, "freedom": 4, "julia": 4, "easili": 4, "notebook": 4, "overrid": 4, "response_cont": 4, "wow": 4, "lot": 4, "breakdown": 4, "impress": 4, "huge": 4, "ye": 4, "serious": 4, "is_json": 4, "myjson": 4, "valueerror": 4, "trial": 4, "elicit": 4, "wrangl": 4, "ad": 4, "hoc": 4, "streamlin": 4, "subsequ": 4, "dataset": 4, "unwant": 4, "ui": 4, "overflow": 4, "overwhelm": 4, "twitter": 4, "youtub": 4, "publish": 4, "schema": 4, "blueprint": 4, "nativ": 4, "json_format": 4, "person1": 4, "q1": 4, "person2": 4, "nest": 4, "todai": 4, "programmat": 4, "thellm": 4, "unend": 4, "whitespac": 4, "forget": 4, "throw": 4, "somewher": 4, "json_object": 4, "sheer": 4, "circul": 4, "vertex": 4, "worri": 4, "enum": 4, "refus": 4, "simpler": 4, "strongli": 4, "secextract": 4, "mentioned_ent": 4, "mentioned_plac": 4, "extract_from_sec_fil": 4, "sec_filing_text": 4, "hint": 4, "prompt_extract": 4, "sec_extract": 4, "washington": 4, "usabl": 4, "beg": 4, "with_structured_output": 4, "runnabl": 4, "typeddict": 4, "qu": 4, "langchain_openai": 4, "chatopenai": 4, "chatprompttempl": 4, "extract_from_sec_filing_langchain": 4, "structured_llm": 4, "from_messag": 4, "sec_extraction_langchain": 4, "hood": 4, "logit": 4, "regex": 4, "enough": 4, "qwen": 4, "malform": 4, "sec_extraction_outlin": 4, "zsp": 4, "zicorp": 4, "phenomenon": 4, "popular": 4, "cpp": 4, "gbnf": 4, "ggml": 4, "bnf": 4, "ggerganov": 4, "accomplish": 4, "backu": 4, "naur": 4, "wikipedia": 4, "contributor": 4, "strictli": 4, "soon": 4, "curl": 4, "fssl": 4, "sh": 4, "extract_entities_from_sec_fil": 4, "suffix": 4, "ollama_structured_output_prompt_suffix": 4, "ollama_structured_output_temperatur": 4, "mistral": 4, "llama2": 4, "uncensor": 4, "model_json_schema": 4, "response_json": 4, "wrapper": 4, "exllama2": 4, "mlx": 4, "lm": 4, "medium": 4, "know": 4, "chanc": 4, "correctli": 4, "famili": 4, "furthermor": 4, "nonetheless": 4, "studi": 4, "wrap": 4, "gemma": 4, "uncov": 4, "wors": 4, "extran": 4, "dispar": 4, "preval": 4, "outdat": 4, "rapidli": 4, "fashion": 4, "remark": 4, "me": 4, "speak": 4, "freeli": 4, "aider": 4, "decod": 4, "outweigh": 4, "rebutt": 4, "argu": 4, "v": 4, "reproduct": 4, "paint": 4, "pictur": 4, "verif": 4, "dottxt": 4, "flaw": 4, "uneven": 4, "didn": 4, "conflat": 4, "argument": 4, "drawback": 4, "unlock": 4, "wider": 4, "thank": 4, "pfiffer": 4, "aid24": 4, "dot24": 4, "sai": 4, "demo": 4, "tree": 4, "gge24": 4, "blob": 4, "readm": 4, "llf": 4, "xieyang": 4, "frederick": 4, "fiannaca": 4, "terri": 4, "koo": 4, "dixon": 4, "cai": 4, "ea": 4, "ny": 4, "usa": 4, "machineri": 4, "doi": 4, "1145": 4, "3613905": 4, "3650756": 4, "ln": 4, "xuan": 4, "hai": 4, "nguyen": 4, "ngoc": 4, "tiviati": 4, "sim": 4, "hieu": 4, "dao": 4, "shafiq": 4, "joti": 4, "kenji": 4, "kawaguchi": 4, "nanci": 4, "min": 4, "kan": 4, "2408": 4, "08656": 4, "out24": 4, "twt": 4, "zhi": 4, "cheng": 4, "kuang": 4, "tsai": 4, "chieh": 4, "hung": 4, "yun": 4, "nung": 4, "02442": 4, "wikipediacontributors24": 4, "wiktionari": 4, "naur_form": 4}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"introduct": [0, 1, 4], "content": [0, 2, 3, 4], "core": 0, "challeng": 0, "we": 0, "ll": 0, "address": 0, "A": [0, 1], "practic": [0, 1, 4], "approach": 0, "note": 0, "perspect": 0, "who": 0, "thi": 0, "book": 0, "i": 0, "For": 0, "outcom": 0, "prerequisit": 0, "set": 0, "up": 0, "your": 0, "environ": 0, "python": 0, "setup": 0, "api": [0, 4], "kei": [0, 2, 3], "configur": 0, "code": 0, "repositori": 0, "troubleshoot": 0, "common": 0, "issu": 0, "about": 0, "author": 0, "": 0, "tame": 1, "llm": [1, 2], "guid": 1, "pitfal": 1, "open": 1, "sourc": 1, "softwar": [1, 2], "chapter": 1, "1": [1, 3], "2": [1, 3], "wrestl": [1, 4], "structur": [1, 4], "output": [1, 3, 4], "3": [1, 3], "input": 1, "size": [1, 3], "length": [1, 3], "limit": [1, 3], "4": [1, 3], "5": 1, "The": [1, 2], "eval": [1, 2], "gap": [1, 2], "6": 1, "hallucin": 1, "realiti": 1, "7": 1, "safeti": 1, "concern": 1, "8": 1, "cost": [1, 3], "factor": 1, "9": 1, "break": 1, "free": 1, "from": 1, "cloud": 1, "provid": [1, 4], "appendix": 1, "tool": [1, 2, 4], "resourc": 1, "non": 2, "determinist": 2, "gener": [2, 3], "machin": 2, "temperatur": 2, "sampl": 2, "spectrum": 2, "emerg": 2, "properti": 2, "problem": [2, 3, 4], "statement": [2, 3, 4], "tradit": 2, "v": 2, "design": 2, "applic": 2, "test": 2, "requir": 2, "matrix": 2, "conceptu": 2, "overview": 2, "consider": [2, 3], "metric": 2, "evalu": 2, "task": 2, "model": [2, 3], "base": [2, 3], "human": 2, "benchmark": 2, "leaderboard": 2, "lightev": 2, "mmlu": 2, "econometr": 2, "dataset": 2, "famili": 2, "us": 2, "langsmith": 2, "promptfoo": 2, "refer": [2, 3, 4], "what": 3, "ar": 3, "token": 3, "comparison": [3, 4], "across": 3, "chunk": 3, "contextu": 3, "link": 3, "long": 3, "form": 3, "step": 3, "write": 3, "prompt": [3, 4], "templat": 3, "construct": 3, "dynam": 3, "paramet": 3, "report": 3, "exampl": 3, "usag": 3, "discuss": [3, 4], "implic": 3, "futur": 3, "conclus": [3, 4], "user": 4, "need": 4, "solut": 4, "strategi": 4, "techniqu": 4, "One": 4, "shot": 4, "specif": 4, "json": 4, "mode": 4, "langchain": 4, "outlin": 4, "ollama": 4, "compar": 4, "framework": 4, "best": 4, "research": 4, "ongo": 4, "debat": 4, "acknowledg": 4}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinxcontrib.bibtex": 9, "sphinx": 57}, "alltitles": {"Introduction": [[0, "introduction"], [4, "introduction"]], "Contents": [[0, "contents"], [2, "contents"], [3, "contents"], [4, "contents"]], "Core Challenges We\u2019ll Address": [[0, "core-challenges-we-ll-address"]], "A Practical Approach": [[0, "a-practical-approach"]], "A Note on Perspective": [[0, "a-note-on-perspective"]], "Who This Book Is For": [[0, "who-this-book-is-for"]], "Outcomes": [[0, "outcomes"]], "Prerequisites": [[0, "prerequisites"]], "Setting Up Your Environment": [[0, "setting-up-your-environment"]], "Python Environment Setup": [[0, "python-environment-setup"]], "API Keys Configuration": [[0, "api-keys-configuration"]], "Code Repository": [[0, "code-repository"]], "Troubleshooting Common Issues": [[0, "troubleshooting-common-issues"]], "About the Author(s)": [[0, "about-the-author-s"]], "Taming LLMs": [[1, "taming-llms"]], "A Practical Guide to LLM Pitfalls with Open Source Software": [[1, "a-practical-guide-to-llm-pitfalls-with-open-source-software"]], "Chapter 1: Introduction": [[1, "chapter-1-introduction"]], "Chapter 2: Wrestling with Structured Output": [[1, "chapter-2-wrestling-with-structured-output"]], "Chapter 3: Input Size and Length Limitations": [[1, "chapter-3-input-size-and-length-limitations"]], "Chapter 4: Output Size and Length Limitations": [[1, "chapter-4-output-size-and-length-limitations"]], "Chapter 5: The Evals Gap": [[1, "chapter-5-the-evals-gap"]], "Chapter 6: Hallucination: The Reality Gap": [[1, "chapter-6-hallucination-the-reality-gap"]], "Chapter 7: Safety Concerns": [[1, "chapter-7-safety-concerns"]], "Chapter 8: The Cost Factor": [[1, "chapter-8-the-cost-factor"]], "Chapter 9: Breaking Free from Cloud Providers": [[1, "chapter-9-breaking-free-from-cloud-providers"]], "Appendix A: Tools and Resources": [[1, "appendix-a-tools-and-resources"]], "The Evals Gap": [[2, "the-evals-gap"]], "Non-Deterministic Generative Machines": [[2, "non-deterministic-generative-machines"]], "Temperature and Sampling": [[2, "temperature-and-sampling"]], "The Temperature Spectrum": [[2, "the-temperature-spectrum"]], "Emerging Properties": [[2, "emerging-properties"]], "Problem Statement": [[2, "problem-statement"], [3, "problem-statement"], [4, "problem-statement"]], "Evals of Traditional Software vs LLMs": [[2, "evals-table"]], "Evals Design": [[2, "evals-design"]], "LLM Application Testing Requirements Matrix": [[2, "validation-requirements"]], "Conceptual Overview": [[2, "conceptual-overview"]], "Design Considerations": [[2, "design-considerations"]], "Metrics": [[2, "metrics"]], "Key Metrics for Evaluating Generative Tasks": [[2, "key-metrics"]], "Evaluators": [[2, "evaluators"]], "Model-Based Evaluation": [[2, "model-based-evaluation"]], "Human-Based Evaluation": [[2, "human-based-evaluation"]], "Evaluating Evaluators": [[2, "evaluating-evaluators"]], "Benchmarks and Leaderboards": [[2, "benchmarks-and-leaderboards"]], "Tools": [[2, "tools"]], "LightEval": [[2, "lighteval"]], "MMLU Econometrics Task Dataset sample": [[2, "mmlu-econometrics"]], "Model Families Evaluated Using LightEval": [[2, "model-families"]], "LangSmith": [[2, "langsmith"]], "PromptFoo": [[2, "promptfoo"]], "References": [[2, "references"], [3, "references"], [4, "references"]], "Output Size Limitations": [[3, "output-size-limitations"]], "What are Token Limits?": [[3, "what-are-token-limits"]], "Token Cost and Length Limitation Comparison Across Key Models": [[3, "token-cost-table"]], "Content Chunking with Contextual Linking": [[3, "content-chunking-with-contextual-linking"]], "Generating long-form content": [[3, "generating-long-form-content"]], "Step 1: Chunking the Content": [[3, "step-1-chunking-the-content"]], "Step 2: Writing the Base Prompt Template": [[3, "step-2-writing-the-base-prompt-template"]], "Step 3: Constructing Dynamic Prompt Parameters": [[3, "step-3-constructing-dynamic-prompt-parameters"]], "Step 4: Generating the Report": [[3, "step-4-generating-the-report"]], "Example Usage": [[3, "example-usage"]], "Discussion": [[3, "discussion"], [4, "discussion"]], "Implications": [[3, "implications"]], "Future Considerations": [[3, "future-considerations"]], "Conclusion": [[3, "conclusion"], [4, "conclusion"]], "Wrestling with Structured Output": [[4, "wrestling-with-structured-output"]], "User Needs": [[4, "user-needs"]], "Solutions": [[4, "solutions"]], "Strategies": [[4, "strategies"]], "Techniques and Tools": [[4, "techniques-and-tools"]], "One-Shot Prompts": [[4, "one-shot-prompts"]], "Structured Output with Provider-Specific APIs": [[4, "structured-output-with-provider-specific-apis"]], "JSON Mode": [[4, "json-mode"]], "LangChain": [[4, "langchain"]], "Outlines": [[4, "outlines"]], "Ollama": [[4, "ollama"]], "Comparing Solutions": [[4, "comparing-solutions"]], "Structured Output Frameworks Comparison": [[4, "structured-output-frameworks"]], "Best Practices": [[4, "best-practices"]], "Research and Ongoing Debate": [[4, "research-and-ongoing-debate"]], "Acknowledgements": [[4, "acknowledgements"]]}, "indexentries": {}}) \ No newline at end of file +Search.setIndex({"docnames": ["markdown/intro", "markdown/toc", "notebooks/evals", "notebooks/output_size_limit", "notebooks/structured_output"], "filenames": ["markdown/intro.md", "markdown/toc.md", "notebooks/evals.ipynb", "notebooks/output_size_limit.ipynb", "notebooks/structured_output.ipynb"], "titles": ["1. Introduction", "Taming LLMs", "4. The Evals Gap", "2. Output Size Limitations", "3. Wrestling with Structured Output"], "terms": {"am": 0, "alwai": [0, 2, 4], "do": [0, 2, 3, 4], "which": [0, 2, 3, 4], "cannot": [0, 2], "order": [0, 2, 4], "mai": [0, 2, 3, 4], "learn": [0, 2], "how": [0, 2, 3, 4], "pablo": [0, 2], "picasso": 0, "In": [0, 2, 3, 4], "recent": [0, 2, 4], "year": [0, 2, 3, 4], "larg": [0, 1, 2, 3, 4], "languag": [0, 1, 2, 3, 4], "model": [0, 1, 4], "llm": [0, 3, 4], "have": [0, 2, 3, 4], "emerg": [0, 1, 4], "transform": [0, 2, 4], "forc": [0, 2, 4], "technologi": [0, 2, 3, 4], "promis": [0, 2], "revolution": 0, "build": [0, 1, 2, 3, 4], "product": [0, 1, 2, 4], "interact": [0, 2, 3, 4], "comput": [0, 2, 3, 4], "from": [0, 2, 3, 4], "chatgpt": [0, 4], "github": [0, 2, 4], "copilot": 0, "claud": [0, 2, 3], "artifact": 0, "system": [0, 2, 3, 4], "captur": [0, 2], "public": [0, 2], "imagin": 0, "spark": 0, "gold": [0, 2], "rush": 0, "ai": [0, 2, 4], "power": [0, 1, 2, 3, 4], "applic": [0, 1, 3, 4], "howev": [0, 2, 3, 4], "beneath": 0, "surfac": [0, 2], "technolog": [0, 2], "revolut": 0, "li": [0, 2], "complex": [0, 2, 3, 4], "landscap": [0, 2], "practition": [0, 2], "must": [0, 2, 3], "navig": [0, 1, 2], "focus": [0, 2, 3, 4], "bring": 0, "awar": [0, 2, 3], "limit": [0, 2, 4], "har": [0, 1, 3], "open": [0, 2, 3, 4], "sourc": [0, 2, 4], "solut": [0, 1, 2, 3], "overcom": [0, 3], "them": [0, 2, 3, 4], "robust": [0, 2, 3, 4], "It": [0, 2, 3, 4], "offer": [0, 2, 3, 4], "critic": [0, 1, 2, 3, 4], "implement": [0, 1, 2, 3, 4], "back": [0, 2, 4], "reproduc": [0, 1, 2], "exampl": [0, 1, 2, 4], "while": [0, 1, 2, 3, 4], "mani": [0, 2, 3, 4], "resourc": [0, 2, 3], "cover": [0, 2, 3], "capabl": [0, 1, 2, 3, 4], "specif": [0, 1, 2, 3], "hidden": 0, "pitfal": 0, "engin": [0, 1, 2, 4], "technic": [0, 1, 2, 3, 4], "manag": [0, 1, 2, 3, 4], "face": [0, 2], "when": [0, 1, 2, 3, 4], "comprehens": [0, 1, 2, 3, 4], "guid": [0, 2, 4], "leverag": [0, 2, 3, 4], "battl": [0, 1], "test": [0, 1, 4], "tool": [0, 3], "throughout": [0, 2, 3, 4], "tackl": [0, 2], "follow": [0, 2, 3, 4], "non": [0, 1, 4], "exhaust": 0, "list": [0, 2, 3, 4], "structur": [0, 2, 3], "un": 0, "reliabl": [0, 2, 4], "struggl": [0, 2, 4], "maintain": [0, 2, 3, 4], "consist": [0, 2, 3, 4], "output": [0, 2], "format": [0, 2, 3, 4], "complic": 0, "integr": [0, 2, 4], "larger": [0, 2, 3, 4], "make": [0, 2, 3, 4], "error": [0, 2, 4], "handl": [0, 1, 2, 3, 4], "more": [0, 2, 3, 4], "size": [0, 2, 4], "length": [0, 2, 4], "constraint": [0, 1, 2, 3, 4], "strict": [0, 4], "token": [0, 1, 2, 4], "both": [0, 2], "input": [0, 2, 3, 4], "requir": [0, 3, 4], "care": [0, 2, 4], "chunk": [0, 1], "strategi": [0, 1, 2, 3], "long": [0, 1, 2, 4], "form": [0, 1, 2, 4], "effect": [0, 2, 3, 4], "tradit": 0, "softwar": [0, 4], "methodologi": [0, 2, 4], "break": [0, 2, 3], "down": [0, 2, 3], "deal": 0, "determinist": [0, 1, 4], "gener": [0, 1, 4], "new": [0, 2, 3, 4], "hallucin": [0, 2, 4], "These": [0, 2, 3, 4], "can": [0, 2, 3, 4], "plausibl": 0, "sound": 0, "entir": [0, 2, 3, 4], "fabric": [0, 2], "inform": [0, 2, 3, 4], "creat": [0, 2, 3, 4], "signific": [0, 2, 3, 4], "risk": [0, 2, 3], "safeti": [0, 2, 4], "secur": [0, 2, 3, 4], "harm": [0, 2], "bias": [0, 2, 4], "inappropri": 0, "safeguard": [0, 2], "monitor": [0, 1, 2], "ensur": [0, 2, 3, 4], "safe": [0, 2, 4], "deploy": [0, 1, 2, 4], "cost": [0, 2, 4], "optim": [0, 1, 2, 3], "The": [0, 3, 4], "financi": [0, 2, 3, 4], "oper": [0, 2, 3, 4], "base": [0, 1, 4], "quickli": [0, 3], "becom": [0, 2, 4], "prohibit": [0, 2], "without": [0, 2, 3, 4], "observ": [0, 2, 4], "vendor": [0, 1, 2], "lock": [0, 1], "cloud": [0, 2, 4], "provid": [0, 2, 3], "depend": [0, 2, 4], "through": [0, 1, 2, 3, 4], "proprietari": [0, 4], "infrastructur": 0, "difficult": [0, 2], "switch": 0, "self": [0, 1, 2], "host": [0, 1, 2], "take": [0, 1, 2, 3, 4], "hand": [0, 3, 4], "concret": [0, 1], "you": [0, 2, 3, 4], "run": [0, 2, 4], "modifi": [0, 2], "real": [0, 2, 3, 4], "world": [0, 2, 4], "scenario": [0, 2, 4], "best": [0, 1, 2], "techniqu": [0, 1, 2, 3], "pattern": [0, 1, 2, 4], "anti": [0, 2], "look": [0, 1, 2], "our": [0, 2, 3, 4], "goal": [0, 2, 3], "discourag": 0, "us": [0, 3, 4], "enabl": [0, 2, 3, 4], "By": [0, 1, 2, 3, 4], "understand": [0, 1, 2, 3, 4], "upfront": [0, 1], "better": [0, 1, 2, 3], "equip": [0, 1, 2], "avoid": [0, 2, 4], "current": [0, 1, 2, 3, 4], "discours": [0, 1], "around": [0, 1, 2, 3, 4], "tend": [0, 1, 2], "toward": [0, 2, 4], "extrem": [0, 2], "either": [0, 2, 3], "uncrit": 0, "enthusiasm": 0, "wholesal": [0, 2], "dismiss": 0, "differ": [0, 2, 3, 4], "focu": [0, 1, 2, 3, 4], "rather": [0, 2], "than": [0, 2], "theoret": 0, "examin": [0, 2, 3, 4], "first": [0, 2, 3, 4], "everi": [0, 2], "concept": [0, 2], "illustr": [0, 2, 3, 4], "execut": [0, 2], "immedi": [0, 2], "analysi": [0, 1, 2, 3], "balanc": [0, 2, 3, 4], "help": [0, 2, 3, 4], "reader": [0, 1], "decis": [0, 2, 4], "intend": [0, 2], "develop": [0, 2, 3, 4], "step": [0, 1, 2, 4], "insight": [0, 2, 3, 4], "along": [0, 2], "guidanc": [0, 4], "framework": [0, 2], "could": [0, 2, 3, 4], "derail": 0, "project": [0, 2], "earli": [0, 2, 4], "befor": [0, 2, 4], "thei": [0, 2, 3, 4], "costli": [0, 2], "problem": [0, 1], "too": [0, 2, 3], "late": 0, "lifecycl": 0, "design": [0, 1, 3, 4], "lead": [0, 2, 3, 4], "genai": 0, "initi": [0, 2, 3, 4], "leader": [0, 2], "architectur": [0, 2, 3, 4], "advoc": 0, "anyon": 0, "seek": [0, 2], "work": [0, 1, 2, 3, 4], "typic": [0, 2, 3, 4], "job": [0, 2], "role": [0, 2, 3, 4], "platform": [0, 2, 3, 4], "backend": [0, 2], "exist": [0, 2], "ml": 0, "transit": [0, 2, 3, 4], "overse": 0, "motiv": [0, 2, 4], "need": [0, 2, 3], "readi": [0, 2], "desir": [0, 2, 4], "perform": [0, 1, 2, 3, 4], "after": [0, 2, 3, 4], "read": [0, 2, 3, 4], "implic": [0, 1, 2], "experi": [0, 2, 3, 4], "recommend": [0, 2, 3, 4], "abl": [0, 2, 3, 4], "deploi": [0, 2, 3], "proper": [0, 4], "realist": 0, "effort": [0, 2, 4], "estim": [0, 2], "impact": [0, 2, 3, 4], "timelin": 0, "To": [0, 2, 3, 4], "most": [0, 2, 3, 4], "should": [0, 2, 3, 4], "basic": [0, 2, 3], "program": [0, 2], "knowledg": [0, 2], "introductori": [0, 1], "langchain": [0, 1, 2, 3], "e": [0, 2, 3, 4], "g": [0, 2, 3, 4], "chat": [0, 2, 3, 4], "prompt": [0, 1, 2], "templat": [0, 1, 2], "access": [0, 2, 3, 4], "openai": [0, 2, 4], "anthrop": [0, 4], "similar": [0, 2, 4], "grade": 0, "dive": 0, "here": [0, 2, 3, 4], "get": [0, 2, 3, 4], "start": [0, 2, 4], "activ": [0, 2], "virtual": [0, 2], "m": [0, 2, 4], "venv": [0, 2], "env": [0, 2, 3, 4], "bin": 0, "On": [0, 2, 4], "window": [0, 1, 2], "script": 0, "instal": [0, 2, 4], "packag": [0, 2, 4], "pip": [0, 2, 4], "r": [0, 2, 3, 4], "txt": [0, 2, 3, 4], "file": [0, 2, 3, 4], "root": 0, "directori": [0, 2], "add": [0, 3], "other": [0, 2, 3, 4], "sensit": [0, 2], "openai_api_kei": 0, "your_openai_api_key_her": 0, "never": [0, 4], "share": [0, 2, 4], "commit": [0, 2], "version": [0, 2, 4], "control": [0, 2, 4], "contain": [0, 2, 3, 4], "kept": [0, 2], "privat": [0, 2], "clone": 0, "companion": 0, "git": 0, "http": [0, 2, 3, 4], "com": [0, 2, 3, 4], "souzatharsi": 0, "tamingllm": [0, 2], "cd": 0, "If": [0, 2, 4], "encount": [0, 1, 2], "rate": [0, 2], "consid": [0, 2, 3, 4], "smaller": [0, 2, 3, 4], "retri": [0, 4], "logic": [0, 2, 3], "conflict": [0, 2], "try": [0, 2, 4], "fresh": 0, "like": [0, 2, 3, 4], "poetri": 0, "check": [0, 2, 4], "page": [0, 2], "known": [0, 2, 4], "now": [0, 2, 3, 4], "let": [0, 2, 3, 4], "begin": [0, 2, 4], "explor": [0, 2, 4], "dr": 0, "tharsi": 0, "souza": 0, "scientist": 0, "special": [0, 2, 4], "he": [0, 2], "lectur": 0, "columbia": 0, "univers": [0, 2], "master": [0, 4], "scienc": [0, 2], "appli": [0, 2, 3, 4], "analyt": 0, "head": [0, 2, 3], "equiti": [0, 2], "citadel": 0, "former": [0, 2], "senior": [0, 2], "vp": 0, "two": [0, 2, 3, 4], "sigma": 0, "invest": [0, 2, 4], "With": [0, 2], "over": [0, 1, 2, 3, 4], "15": [0, 2, 4], "deliv": [0, 2], "across": [0, 2, 4], "startup": 0, "fortun": 0, "500": [0, 2], "compani": [0, 2, 3, 4], "global": [0, 2], "also": [0, 2, 3, 4], "an": [0, 1, 2, 3, 4], "numer": [0, 2], "scholarli": 0, "frequent": [0, 2, 4], "speaker": [0, 2], "academ": [0, 2], "busi": [0, 2], "confer": [0, 4], "ground": [0, 1, 2], "background": [0, 2, 3], "draw": [0, 2, 4], "scale": [0, 2, 4], "stage": [0, 4], "major": [0, 2, 4], "institut": [0, 2], "well": [0, 2, 4], "advis": 0, "profit": [0, 2, 3, 4], "organ": [0, 2, 3], "contribut": [0, 2, 3], "uniqu": [0, 2], "bridg": 0, "gap": 0, "between": [0, 2, 3, 4], "potenti": [0, 2, 3, 4], "next": [0, 2, 4], "hold": [0, 2], "ph": 0, "d": [0, 2, 4], "ucl": 0, "london": 0, "phil": 0, "sc": 0, "b": [0, 2, 4], "abstract": [1, 2, 4], "heavili": [1, 2, 4], "gloss": 1, "fundament": [1, 2, 4], "challeng": [1, 2, 3, 4], "convers": [1, 2, 3, 4], "thi": [1, 2, 3, 4], "book": [1, 2], "kei": [1, 4], "python": [1, 2, 3, 4], "proven": 1, "yet": [1, 2, 3], "i": [1, 2, 3, 4], "unstructur": [1, 4], "context": [1, 2, 3, 4], "code": [1, 2, 4], "sidestep": 1, "inher": [1, 2, 3, 4], "core": [1, 2], "we": [1, 2, 3, 4], "ll": [1, 2], "address": [1, 2, 3, 4], "approach": [1, 2, 3, 4], "note": [1, 2, 3, 4], "perspect": 1, "who": [1, 2, 3, 4], "For": [1, 2, 3, 4], "outcom": [1, 2, 4], "prerequisit": 1, "set": [1, 2, 3, 4], "up": [1, 2, 3, 4], "your": [1, 2, 3, 4], "environ": [1, 2, 3, 4], "setup": [1, 2, 4], "api": [1, 2], "configur": [1, 2], "repositori": [1, 2], "troubleshoot": 1, "common": [1, 2, 3, 4], "issu": [1, 2, 3, 4], "about": [1, 2, 3, 4], "author": [1, 2, 4], "": [1, 2, 3, 4], "statement": 1, "One": [1, 2], "shot": [1, 2], "json": [1, 2, 3], "mode": 1, "outlin": [1, 2], "multipl": [1, 2, 3, 4], "choic": [1, 2, 4], "pydant": [1, 2, 4], "discuss": [1, 2], "compar": [1, 2, 3], "research": [1, 2, 3], "ongo": [1, 2], "debat": 1, "conclus": [1, 2], "acknowledg": [1, 2], "refer": 1, "content": 1, "what": [1, 2, 4], "ar": [1, 2, 4], "contextu": [1, 2], "link": [1, 2], "write": [1, 2, 4], "construct": [1, 2, 4], "dynam": [1, 2], "paramet": [1, 2, 4], "report": [1, 2, 4], "usag": [1, 2, 4], "futur": [1, 2], "consider": [1, 4], "machin": [1, 4], "temperatur": [1, 3, 4], "sampl": [1, 3, 4], "spectrum": 1, "properti": 1, "conceptu": [1, 4], "overview": [1, 4], "compon": [1, 2], "metric": 1, "evalu": [1, 3, 4], "human": [1, 3, 4], "benchmark": 1, "leaderboard": 1, "type": [1, 2, 3, 4], "detect": [1, 2, 4], "retriev": [1, 2], "augment": [1, 2], "rag": 1, "select": [1, 2], "index": [1, 2, 3, 4], "vector": 1, "store": [1, 2, 3], "method": [1, 2, 3, 4], "pipelin": [1, 2, 4], "valid": [1, 2, 4], "guard": 1, "filter": [1, 2, 4], "sanit": 1, "alert": 1, "cach": [1, 2], "invalid": [1, 4], "predict": [1, 2, 4], "llama": [1, 2, 4], "llamafil": 1, "ollama": 1, "migrat": 1, "commun": [1, 2, 4], "doesn": [2, 3, 4], "t": [2, 3, 4], "matter": 2, "beauti": 2, "theori": 2, "smart": 2, "agre": 2, "wrong": 2, "richard": 2, "feynman": 2, "natur": [2, 3, 4], "unlik": 2, "where": [2, 3, 4], "same": [2, 3, 4], "produc": [2, 4], "novel": 2, "text": [2, 3, 4], "train": [2, 4], "data": [2, 3, 4], "respons": [2, 3, 4], "each": [2, 3, 4], "time": [2, 3, 4], "re": [2, 3, 4], "queri": 2, "even": [2, 3, 4], "ident": 2, "behavior": 2, "strength": 2, "ask": [2, 4], "question": [2, 4], "isn": 2, "bug": 2, "featur": [2, 4], "random": [2, 4], "allow": [2, 3, 4], "creativ": [2, 4], "divers": [2, 3, 4], "testabl": 2, "servic": [2, 3, 4], "advic": 2, "mean": [2, 3, 4], "yield": 2, "exceedingli": 2, "regulatori": 2, "complianc": [2, 4], "guarante": [2, 4], "user": [2, 3], "trust": [2, 4], "affect": 2, "inconsist": [2, 4], "primari": 2, "determin": [2, 3, 4], "come": [2, 3, 4], "dure": [2, 4], "calcul": 2, "probabl": [2, 4], "distribut": [2, 4], "nucleu": 2, "holtzman": 2, "et": [2, 4], "al": [2, 4], "2020": 2, "top": [2, 4], "k": [2, 3, 4], "coher": [2, 3], "0": [2, 3, 4], "repetit": [2, 3, 4], "1": [2, 4], "increas": [2, 3, 4], "incoher": 2, "dotenv": [2, 3, 4], "import": [2, 3, 4], "load_dotenv": [2, 3, 4], "o": [2, 3, 4], "load": [2, 3, 4], "variabl": [2, 3, 4], "panda": 2, "pd": 2, "def": [2, 3, 4], "generate_respons": 2, "model_nam": [2, 3], "str": [2, 3, 4], "float": [2, 3], "attempt": [2, 3], "int": [2, 3], "3": [2, 4], "datafram": 2, "demonstr": [2, 3, 4], "client": [2, 4], "result": [2, 3, 4], "temp": 2, "rang": [2, 3, 4], "complet": [2, 3, 4], "messag": [2, 4], "max_token": 2, "50": 2, "append": [2, 3, 4], "displai": [2, 4], "group": [2, 3], "df_result": 2, "print": [2, 3, 4], "f": [2, 3, 4], "ntemperatur": 2, "40": 2, "temp_respons": 2, "_": [2, 4], "row": 2, "iterrow": 2, "return": [2, 3, 4], "max_length": [2, 4], "10000": [2, 3, 4], "appl": [2, 3, 4], "sec_fil": [2, 4], "unit": [2, 3, 4], "state": [2, 3, 4], "nsecur": 2, "AND": [2, 4], "exchang": [2, 3, 4], "commiss": [2, 3, 4], "nwashington": 2, "c": [2, 4], "20549": 2, "n": [2, 3, 4], "nform": 2, "10": [2, 3, 4], "mark": 2, "annual": 2, "pursuant": 2, "TO": 2, "section": [2, 3, 4], "13": 2, "OR": 2, "OF": 2, "THE": 2, "act": 2, "1934": 2, "nfor": 2, "fiscal": [2, 3], "end": [2, 3, 4], "septemb": [2, 3], "28": [2, 3], "2024": [2, 3, 4], "nor": 2, "period": [2, 3], "ncommiss": 2, "number": [2, 3, 4], "001": 2, "36743": 2, "ng66145g66i43": 2, "jpg": 2, "nappl": 2, "inc": [2, 3, 4], "exact": 2, "name": [2, 3, 4], "registr": 2, "specifi": [2, 3, 4], "its": [2, 3, 4], "charter": 2, "ncalifornia": 2, "t94": 2, "2404110": 2, "jurisdict": 2, "nof": 2, "incorpor": 2, "employ": 2, "identif": 2, "No": [2, 4], "none": 2, "park": 2, "wai": [2, 3, 4], "ncupertino": 2, "california": [2, 4], "n95014": 2, "princip": 2, "offic": 2, "zip": 2, "408": 2, "996": 2, "1010": 2, "telephon": 2, "includ": [2, 3, 4], "area": [2, 4], "regist": 2, "12": [2, 3], "ntitl": 2, "class": [2, 3, 4], "ttrade": 2, "symbol": 2, "tname": 2, "ncommon": 2, "stock": [2, 4], "00001": 2, "par": 2, "valu": [2, 3, 4], "per": [2, 3], "naapl": 2, "tthe": 2, "nasdaq": [2, 4], "market": [2, 3, 4], "llc": [2, 4], "n0": 2, "000": [2, 4], "due": [2, 3], "2025": 2, "875": 2, "n1": 2, "625": 2, "2026": 2, "n2": 2, "2027": 2, "375": 2, "2029": 2, "n3": 2, "050": 2, "2031": 2, "600": 2, "2042": 2, "nindic": 2, "season": 2, "issuer": 2, "defin": [2, 3, 4], "rule": [2, 3, 4], "405": 2, "nye": 2, "whether": [2, 3, 4], "ha": [2, 4], "all": [2, 3, 4], "preced": 2, "month": 2, "shorter": 2, "wa": [2, 4], "2": [2, 4], "been": 2, "subject": 2, "past": 2, "90": 2, "dai": [2, 4], "submit": 2, "electron": 2, "regul": [2, 4], "232": 2, "chapter": 2, "acceler": 2, "filer": 2, "growth": 2, "see": [2, 4], "definit": [2, 4], "12b": 2, "nlarg": 2, "tacceler": 2, "nnon": 2, "tsmaller": 2, "nemerg": 2, "nif": 2, "indic": [2, 4], "elect": 2, "extend": [2, 4], "compli": [2, 4], "ani": [2, 3, 4], "revis": 2, "account": 2, "standard": 2, "attest": 2, "assess": [2, 3], "intern": 2, "under": [2, 4], "404": 2, "sarban": 2, "oxlei": 2, "u": [2, 4], "7262": 2, "firm": 2, "prepar": [2, 3], "audit": 2, "reflect": 2, "correct": [2, 4], "previous": [2, 3, 4], "those": [2, 3, 4], "restat": 2, "recoveri": 2, "incent": 2, "compens": 2, "receiv": [2, 3, 4], "relev": 2, "240": 2, "10d": 2, "shell": 2, "nthe": 2, "aggreg": 2, "vote": 2, "held": [2, 4], "affili": [2, 4], "march": [2, 4], "29": [2, 4], "last": [2, 3, 4], "second": [2, 3], "quarter": 2, "approxim": [2, 4], "628": [2, 4], "553": [2, 4], "sole": 2, "purpos": [2, 4], "disclosur": 2, "director": 2, "date": [2, 4], "exclud": 2, "becaus": 2, "person": [2, 4], "deem": 2, "necessarili": 2, "n15": 2, "115": [2, 4], "823": [2, 4], "were": [2, 4], "outstand": [2, 4], "octob": [2, 4], "18": [2, 4], "ndocument": 2, "BY": 2, "nportion": 2, "proxi": 2, "relat": 2, "meet": [2, 4], "sharehold": 2, "part": [2, 3, 4], "iii": 2, "within": [2, 3, 4], "120": 2, "ntabl": 2, "npage": 2, "npart": 2, "nitem": 2, "nbusi": 2, "1a": 2, "nrisk": 2, "factor": [2, 3, 4], "n5": 2, "1b": 2, "nunresolv": 2, "staff": 2, "comment": 2, "n17": 2, "1c": 2, "ncybersecur": 2, "nproperti": 2, "n18": 2, "nlegal": 2, "proceed": 2, "4": [2, 4], "nmine": 2, "ii": [2, 4], "5": [2, 3, 4], "nmarket": 2, "stockhold": 2, "purchas": 2, "n19": 2, "6": [2, 3, 4], "reserv": 2, "n20": 2, "7": [2, 3], "nmanag": 2, "condit": 2, "n21": 2, "7a": 2, "nquantit": 2, "qualit": 2, "n27": 2, "8": [2, 3], "nfinanci": 2, "supplementari": 2, "n28": 2, "9": 2, "nchang": 2, "disagr": 2, "n51": 2, "9a": 2, "ncontrol": 2, "procedur": 2, "9b": 2, "nother": 2, "n52": 2, "9c": 2, "ndisclosur": 2, "regard": 2, "foreign": 2, "prevent": [2, 4], "inspect": 2, "ndirector": 2, "corpor": 2, "govern": 2, "11": 2, "nexecut": 2, "ownership": 2, "certain": [2, 3, 4], "benefici": 2, "owner": 2, "ncertain": 2, "relationship": 2, "transact": 2, "independ": [2, 4], "14": [2, 4], "nprincip": 2, "fee": 2, "iv": 2, "nexhibit": 2, "schedul": 2, "n53": 2, "16": 2, "summari": [2, 4], "n56": 2, "nthi": 2, "forward": 2, "litig": 2, "reform": 2, "1995": 2, "involv": [2, 4], "uncertainti": 2, "locat": 2, "item": 2, "expect": [2, 3, 4], "event": 2, "assumpt": 2, "doe": [2, 3, 4], "directli": [2, 4], "histor": 2, "fact": 2, "macroeconom": 2, "identifi": [2, 3, 4], "word": [2, 3, 4], "anticip": 2, "believ": [2, 4], "plan": [2, 4], "would": [2, 3, 4], "term": [2, 3], "actual": [2, 3, 4], "significantli": [2, 3], "might": [2, 3, 4], "caus": 2, "assum": [2, 3], "oblig": [2, 3], "updat": [2, 3, 4], "reason": [2, 3, 4], "except": [2, 4], "law": 2, "nunless": 2, "otherwis": 2, "present": [2, 3, 4], "herein": 2, "calendar": 2, "particular": [2, 4], "associ": [2, 3, 4], "collect": [2, 3], "wholli": 2, "own": [2, 3], "subsidiari": 2, "unless": 2, "ncompani": 2, "manufactur": 2, "smartphon": 2, "tablet": 2, "wearabl": [2, 4], "accessori": 2, "sell": 2, "varieti": 2, "52": 2, "53": 2, "week": 2, "saturdai": 2, "nproduct": 2, "niphon": 2, "line": 2, "io": [2, 4], "iphon": [2, 4], "pro": [2, 3], "se": 2, "nmac": 2, "maco": 2, "mac": [2, 4], "laptop": 2, "macbook": 2, "air": 2, "desktop": 2, "imac": 2, "mini": [2, 3, 4], "studio": 2, "nipad": 2, "multipurpos": 2, "ipado": 2, "ipad": [2, 4], "nwearabl": 2, "home": 2, "smartwatch": 2, "wireless": 2, "headphon": 2, "spatial": 2, "watcho": 2, "watch": 2, "ultra": 2, "seri": 2, "airpod": 2, "max": 2, "beat": 2, "vision": 2, "visiono": 2, "nhome": 2, "tv": 2, "media": 2, "stream": [2, 4], "game": 2, "devic": [2, 4], "tvo": 2, "homepod": 2, "high": [2, 3], "fidel": 2, "naccessori": 2, "brand": 2, "third": 2, "parti": 2, "nservic": 2, "nadvertis": 2, "advertis": 2, "licens": 2, "arrang": 2, "napplecar": 2, "portfolio": [2, 4], "support": [2, 4], "applecar": 2, "prioriti": 2, "network": [2, 4], "repair": 2, "replac": 2, "case": [2, 3, 4], "addit": [2, 3, 4], "coverag": 2, "instanc": [2, 3], "accident": 2, "damag": 2, "theft": 2, "loss": 2, "countri": 2, "ncloud": 2, "keep": [2, 3], "custom": 2, "avail": [2, 3, 4], "ndigit": 2, "variou": [2, 3, 4], "app": 2, "discov": 2, "download": 2, "digit": 2, "music": 2, "video": 2, "podcast": 2, "subscript": 2, "arcad": 2, "fit": [2, 3, 4], "sm": 2, "curat": 2, "listen": 2, "demand": [2, 4], "radio": 2, "station": 2, "magazin": 2, "exclus": 2, "origin": [2, 3, 4], "live": 2, "sport": 2, "npayment": 2, "payment": 2, "card": 2, "co": 2, "credit": 2, "pai": 2, "cashless": 2, "nsegment": 2, "primarili": 2, "geograph": 2, "basi": 2, "segment": [2, 3, 4], "america": 2, "europ": 2, "greater": 2, "china": 2, "japan": 2, "rest": 2, "asia": 2, "pacif": 2, "north": 2, "south": 2, "european": 2, "india": 2, "middl": 2, "east": 2, "africa": 2, "mainland": 2, "hong": 2, "kong": 2, "taiwan": 2, "australia": 2, "asian": 2, "although": 2, "hardwar": 2, "one": [2, 3, 4], "separ": [2, 3], "align": [2, 3, 4], "partner": 2, "region": 2, "consum": [2, 4], "small": [2, 4], "mid": [2, 3], "educ": [2, 3], "enterpris": [2, 4], "resel": 2, "retail": 2, "onlin": 2, "direct": 2, "sale": 2, "emploi": [2, 4], "indirect": 2, "channel": 2, "cellular": 2, "carrier": 2, "net": [2, 4], "38": 2, "62": 2, "respect": 2, "total": [2, 3, 4], "ncompetit": 2, "highli": [2, 4], "competit": 2, "character": 2, "aggress": 2, "price": 2, "downward": 2, "pressur": 2, "gross": 2, "margin": [2, 4], "introduct": [2, 3], "short": [2, 3, 4], "life": 2, "cycl": 2, "evolv": [2, 3], "industri": [2, 4], "continu": [2, 3, 4], "improv": [2, 3, 4], "characterist": 2, "rapid": 2, "adopt": [2, 4], "advanc": [2, 3, 4], "competitor": 2, "compet": 2, "veri": 2, "low": [2, 4], "imit": 2, "infring": 2, "intellectu": 2, "abil": [2, 4], "successfulli": [2, 4], "innov": [2, 3], "marketplac": 2, "nearli": 2, "rel": 2, "qualiti": [2, 3, 4], "strong": [2, 4], "ecosystem": 2, "reput": 2, "expand": 2, "opportun": 2, "substanti": 2, "establish": 2, "some": [2, 3, 4], "broader": 2, "lower": [2, 4], "particularli": [2, 3, 4], "intens": [2, 4], "cut": [2, 3], "littl": 2, "free": 2, "illegitim": 2, "obtain": [2, 4], "collabor": 2, "nsuppli": 2, "nalthough": 2, "essenti": [2, 3, 4], "singl": [2, 3, 4], "particip": 2, "therefor": 2, "wide": [2, 3, 4], "shortag": 2, "commod": 2, "fluctuat": 2, "commonli": 2, "introduc": [2, 3, 4], "often": [2, 3, 4], "util": [2, 3], "onli": [2, 3, 4], "capac": 2, "until": [2, 4], "supplier": 2, "matur": 2, "accept": 2, "decid": [2, 3], "concentr": 2, "instead": [2, 3, 4], "enter": 2, "agreement": 2, "suppli": [2, 4], "renew": 2, "nresearch": 2, "nbecaus": 2, "upon": [2, 3], "flow": [2, 3], "enhanc": [2, 3, 4], "acquisit": 2, "nintellectu": 2, "broad": [2, 4], "right": 2, "aspect": [2, 3, 4], "patent": 2, "copyright": 2, "trademark": 2, "trade": [2, 4], "secret": 2, "differenti": 2, "success": [2, 4], "reli": 2, "skill": 2, "personnel": 2, "regularli": 2, "protect": 2, "aris": 2, "pursu": 2, "thousand": 2, "accumul": 2, "durat": 2, "adequ": 2, "nin": 2, "necessari": [2, 3], "process": [2, 3, 4], "commerci": [2, 4], "experienc": 2, "higher": 2, "holidai": 2, "addition": 2, "expens": 2, "fill": 2, "inventori": 2, "launch": 2, "older": 2, "declin": 2, "newer": 2, "distributor": 2, "nhuman": 2, "capit": [2, 3, 4], "peopl": 2, "plai": [2, 4], "strive": 2, "attract": 2, "retain": [2, 3], "talent": 2, "inclus": [2, 3, 4], "team": [2, 4], "member": 2, "so": [2, 4], "As": [2, 3, 4], "had": 2, "164": 2, "full": [2, 3, 4], "equival": 2, "employe": 2, "ncompens": 2, "benefit": [2, 4], "equit": 2, "recogn": 2, "thrive": [2, 4], "succe": 2, "profession": [2, 4], "health": 2, "awai": 2, "ngrowth": 2, "achiev": [2, 4], "career": 2, "leadership": 2, "influenc": [2, 4], "cultur": 2, "advantag": [2, 3, 4], "being": 2, "nworkplac": 2, "practic": [2, 3], "polici": 2, "equal": 2, "workplac": 2, "harass": 2, "discrimin": 2, "ninclus": 2, "sustain": 2, "workforc": 2, "repres": [2, 4], "serv": [2, 3, 4], "represent": [2, 3], "level": [2, 3, 4], "foster": [2, 4], "nengag": 2, "honest": 2, "among": 2, "everyon": 2, "grow": [2, 4], "encourag": [2, 4], "feedback": [2, 4], "concern": 2, "conduct": 2, "survei": [2, 4], "gaug": 2, "sentiment": [2, 4], "nhealth": 2, "everywher": 2, "measur": 2, "mitig": [2, 3, 4], "possibl": [2, 4], "hazard": 2, "crisi": 2, "put": 2, "place": [2, 4], "visitor": 2, "navail": 2, "quarterli": 2, "q": 2, "amend": 2, "sec": [2, 3, 4], "Such": 2, "charg": 2, "investor": [2, 4], "default": [2, 4], "aspx": 2, "websit": 2, "www": 2, "press": 2, "releas": [2, 4], "environment": 2, "social": 2, "detail": [2, 3, 4], "referenc": 2, "further": [2, 3, 4], "url": [2, 4], "inact": 2, "textual": 2, "unknown": 2, "describ": 2, "below": [2, 3, 4], "materi": [2, 4], "advers": 2, "trend": [2, 4], "conjunct": 2, "consolid": 2, "accompani": 2, "nmacroeconom": 2, "econom": 2, "outsid": 2, "chain": [2, 3], "facil": 2, "assembli": 2, "site": 2, "nadvers": 2, "slow": 2, "recess": 2, "unemploy": 2, "inflat": 2, "tighter": 2, "interest": [2, 3, 4], "currenc": 2, "confid": [2, 4], "spend": 2, "chang": 2, "monetari": 2, "volatil": 2, "incom": 2, "asset": 2, "contract": 2, "logist": 2, "instabl": 2, "inabl": 2, "financ": 2, "insolv": 2, "failur": 2, "deriv": 2, "counterparti": 2, "debt": 2, "reduc": [2, 3, 4], "liquid": [2, 3], "fair": 2, "instrument": 2, "polit": 2, "disput": 2, "geopolit": 2, "tension": 2, "terror": 2, "disast": 2, "accid": 2, "interrupt": 2, "npolit": 2, "whole": 2, "outsourc": 2, "korea": 2, "vietnam": 2, "restrict": [2, 4], "tariff": 2, "export": 2, "good": [2, 4], "portion": 2, "revenu": [2, 3, 4], "raw": [2, 4], "go": [2, 3, 4], "action": [2, 3], "restructur": 2, "ceas": 2, "accord": [2, 4], "disrupt": [2, 3], "announc": 2, "notic": [2, 4], "led": [2, 4], "escal": [2, 3], "sever": [2, 3, 4], "nmani": 2, "prone": 2, "earthquak": 2, "climat": 2, "weather": 2, "occur": 2, "fire": 2, "nuclear": 2, "plant": 2, "terrorist": 2, "attack": 2, "hostil": 2, "ransomwar": 2, "cybersecur": 2, "labor": 2, "beyond": 2, "nsuch": 2, "imposs": 2, "delai": 2, "ineffici": 2, "slowdown": 2, "outag": 2, "neg": [2, 4], "seriou": 2, "injuri": 2, "pandem": 2, "covid": 2, "19": 2, "economi": 2, "imposit": 2, "stringent": 2, "travel": 2, "freight": 2, "movement": 2, "ramp": 2, "nfollow": 2, "expenditur": 2, "resum": 2, "lose": 2, "exacerb": 2, "consequ": [2, 4], "insur": 2, "insuffici": 2, "nglobal": 2, "unabl": 2, "There": [2, 3, 4], "assur": 2, "contrast": 2, "minor": 2, "overal": [2, 3, 4], "naddition": 2, "intensifi": 2, "seamlessli": [2, 3], "function": [2, 3, 4], "nto": 2, "remain": [2, 3], "stimul": 2, "ndue": 2, "upgrad": 2, "appropri": [2, 3, 4], "quantiti": 2, "defect": 2, "defici": 2, "supersed": 2, "nsubstanti": 2, "much": 2, "transport": 2, "diminish": 2, "flexibl": [2, 3, 4], "respond": 2, "provis": 2, "reimburs": 2, "warranti": 2, "out": [2, 3, 4], "unanticip": 2, "liabil": 2, "adher": [2, 3, 4], "violat": 2, "final": [2, 3, 4], "finish": 2, "destin": 2, "man": 2, "made": [2, 3, 4], "prepay": 2, "termin": 2, "recover": 2, "exposur": 2, "nfutur": 2, "suffici": [2, 4], "semiconductor": 2, "suffer": 2, "poor": 2, "constrain": [2, 3, 4], "shipment": 2, "altern": [2, 3], "sophist": [2, 3], "unexpectedli": 2, "interfer": 2, "unsaf": 2, "artifici": 2, "intellig": 2, "expos": 2, "inaccur": [2, 4], "fix": [2, 3], "widespread": 2, "vulner": 2, "exploit": 2, "compromis": 2, "claim": 2, "recal": 2, "modif": 2, "off": [2, 3, 4], "intang": 2, "fine": [2, 4], "lost": [2, 3], "cancel": 2, "record": 2, "obsolet": 2, "exce": 2, "realiz": 2, "accru": 2, "excess": 2, "review": [2, 4], "impair": 2, "whenev": 2, "circumst": 2, "amount": [2, 3, 4], "carri": [2, 4], "incur": 2, "given": [2, 3, 4], "unpredict": [2, 4], "pace": 2, "obsolesc": 2, "forecast": 2, "150": 2, "incorrectli": [2, 4], "fulli": [2, 3], "extens": [2, 3, 4], "issuanc": 2, "unknowingli": 2, "notifi": 2, "preclud": 2, "choos": 2, "bui": 2, "percept": 2, "android": 2, "playstat": 2, "nintendo": 2, "xbox": 2, "posit": [2, 3, 4], "less": 2, "inclin": 2, "devot": 2, "compel": [2, 4], "fail": 2, "dissatisfi": 2, "vast": 2, "legal": 2, "storefront": 2, "mechan": [2, 4], "safari": 2, "union": 2, "eu": 2, "dma": 2, "interfac": 2, "reduct": 2, "narrow": 2, "scope": [2, 3], "elimin": 2, "nfailur": 2, "appeal": 2, "subscrib": 2, "nsome": 2, "manner": [2, 3, 4], "nurtur": 2, "distinct": 2, "nmuch": 2, "chief": 2, "especi": [2, 3, 4], "silicon": 2, "vallei": 2, "constantli": 2, "driver": 2, "recruit": 2, "subsidi": 2, "staf": 2, "contractor": 2, "placement": 2, "increment": 2, "weaken": 2, "stop": [2, 3], "telecommun": 2, "war": 2, "virus": 2, "physic": 2, "ins": 2, "incid": 2, "redund": 2, "ineffect": 2, "inadequ": 2, "eventu": 2, "thing": [2, 4], "interf": 2, "imped": 2, "ship": 2, "nloss": 2, "unauthor": 2, "confidenti": 2, "encrypt": 2, "But": [2, 4], "absolut": [2, 4], "malici": 2, "behalf": 2, "gain": 2, "regular": [2, 4], "normal": [2, 4], "investig": 2, "penalti": 2, "judgment": 2, "against": 2, "frequenc": [2, 3], "actor": 2, "circumv": [2, 3], "remov": 2, "obfusc": 2, "forens": 2, "evid": [2, 4], "hinder": [2, 4], "recov": 2, "perpetr": 2, "target": [2, 4], "profil": 2, "authent": 2, "hack": 2, "malfeas": 2, "faulti": 2, "password": 2, "irregular": 2, "fraudul": 2, "induc": 2, "disclos": [2, 3, 4], "usernam": 2, "turn": 2, "multifactor": 2, "unusu": 2, "freez": 2, "suspici": 2, "nwhile": 2, "ninvest": 2, "contempl": 2, "endeavor": 2, "distract": 2, "tangibl": 2, "approv": 2, "oner": 2, "ventur": 2, "riski": 2, "pose": [2, 3, 4], "leas": 2, "unfavor": 2, "arisen": 2, "ordinari": 2, "cours": 2, "resolv": 2, "sometim": [2, 4], "indemnif": 2, "indemnifi": 2, "alleg": 2, "magnitud": 2, "assert": 2, "royalti": 2, "vigor": 2, "defend": 2, "court": 2, "internation": 2, "plaintiff": 2, "injunct": 2, "relief": 2, "nregardless": 2, "merit": 2, "recognit": 2, "settl": 2, "uncertain": 2, "abov": 2, "disgorg": 2, "remedi": 2, "worldwid": 2, "antitrust": 2, "privaci": [2, 4], "local": [2, 3, 4], "bill": 2, "commerc": 2, "internet": 2, "mobil": [2, 4], "televis": 2, "film": 2, "anticorrupt": 2, "cash": [2, 3], "repatri": 2, "monei": 2, "launder": 2, "tax": 2, "wast": 2, "recycl": 2, "ncomplianc": 2, "impos": [2, 4], "interpret": 2, "ethic": 2, "agent": 2, "found": [2, 4], "nregulatori": 2, "satisfi": 2, "ban": 2, "nexpect": 2, "stakehold": 2, "increasingli": [2, 4], "greenhous": 2, "ga": 2, "emiss": 2, "civil": 2, "disagre": 2, "perceiv": 2, "feder": 2, "vari": 2, "scrutini": 2, "nfrom": 2, "taken": [2, 4], "engag": [2, 4], "noncompli": 2, "individu": [2, 3], "lawsuit": 2, "monopol": 2, "nfurther": 2, "earn": 2, "googl": [2, 4], "search": 2, "nthere": 2, "connect": [2, 4], "retent": 2, "transfer": 2, "pass": [2, 4], "pend": 2, "inquiri": 2, "government": 2, "entiti": [2, 4], "biometr": 2, "breach": 2, "notif": 2, "permit": [2, 4], "healthcar": 2, "liabl": 2, "investigatori": 2, "cardhold": 2, "compress": [2, 3], "acquir": 2, "shift": 2, "mix": [2, 4], "extent": 2, "unexpect": [2, 4], "dollar": 2, "denomin": 2, "rais": [2, 3], "offset": 2, "strengthen": 2, "nconvers": 2, "therebi": [2, 3], "thu": 2, "option": [2, 3, 4], "hedg": 2, "deterior": 2, "sovereign": 2, "heighten": 2, "worsen": 2, "A": [2, 3, 4], "collater": 2, "bank": 2, "unsecur": 2, "subassembli": 2, "assembl": 2, "few": [2, 3, 4], "legisl": 2, "ireland": 2, "singapor": 2, "organis": 2, "propos": 2, "modern": [2, 3, 4], "minimum": 2, "statutori": 2, "valuat": 2, "defer": 2, "bodi": 2, "likelihood": 2, "adequaci": 2, "ultim": 2, "ow": 2, "ngener": 2, "volum": [2, 3], "unrel": 2, "averag": [2, 4], "repurchas": 2, "point": [2, 3], "dividend": 2, "consumm": 2, "declar": 2, "board": 2, "unresolv": 2, "nnone": 2, "threat": 2, "dedic": [2, 4], "postur": 2, "25": 2, "sinc": [2, 3, 4], "2016": 2, "coordin": 2, "assist": [2, 4], "log": 2, "track": 2, "committe": 2, "oversight": 2, "counsel": 2, "chair": 2, "substanc": 2, "17": 2, "headquart": 2, "cupertino": [2, 4], "land": 2, "center": [2, 4], "suitabl": 2, "formal": [2, 4], "articl": [2, 3], "promot": 2, "conclud": 2, "uninstal": 2, "web": 2, "browser": 2, "screen": 2, "june": 2, "24": [2, 4], "preliminari": 2, "find": [2, 3, 4], "contractu": 2, "desist": 2, "stai": [2, 3], "grant": 2, "ndepart": 2, "justic": 2, "21": 2, "depart": 2, "doj": 2, "district": 2, "attornei": 2, "jersei": 2, "redress": 2, "anticompetit": 2, "nonmonetari": 2, "defens": 2, "itself": 2, "nepic": 2, "epic": 2, "northern": 2, "unfair": 2, "guidelin": 2, "enjoin": 2, "extern": 2, "januari": 2, "motion": 2, "enforc": [2, 4], "oppos": 2, "30": 2, "vacat": 2, "fourth": 2, "did": [2, 4], "mine": 2, "nnot": 2, "aapl": 2, "nholder": 2, "na": 2, "23": 2, "301": 2, "npurchas": 2, "nshare": 2, "three": 2, "million": 2, "nperiod": 2, "ttotal": 2, "taverag": 2, "npaid": 2, "publicli": [2, 4], "nannounc": 2, "napproxim": 2, "That": [2, 4], "Be": 2, "nunder": 2, "njune": 2, "august": 2, "nopen": 2, "negoti": 2, "t35": 2, "697": 2, "t224": 2, "naugust": 2, "31": 2, "t42": 2, "910": 2, "t221": 2, "39": 2, "nseptemb": 2, "t33": 2, "653": 2, "t222": 2, "86": 2, "ntotal": 2, "t112": 2, "260": 2, "t89": 2, "074": 2, "110": 2, "billion": 2, "20": [2, 4], "previou": [2, 3, 4], "2023": [2, 4], "10b5": 2, "graph": 2, "show": [2, 3, 4], "comparison": 2, "five": 2, "cumul": 2, "reinvest": 2, "p": [2, 4], "dow": 2, "jone": 2, "supersector": 2, "100": [2, 4], "close": 2, "27": 2, "2019": 2, "n2218": 2, "tseptemb": 2, "2021": 2, "2022": 2, "t100": 2, "t207": 2, "t273": 2, "t281": 2, "t322": 2, "t430": 2, "t113": 2, "t156": 2, "t131": 2, "t155": 2, "t210": 2, "ndow": 2, "t146": 2, "t216": 2, "t215": 2, "nfirst": 2, "nsecond": 2, "nthird": 2, "sequoia": 2, "nfourth": 2, "plu": 2, "nfiscal": 2, "six": 2, "realign": 2, "span": 2, "wherea": 2, "indirectli": 2, "tabl": [2, 3, 4], "n2024": 2, "tchang": 2, "t2023": 2, "t2022": 2, "namerica": 2, "t167": 2, "045": 2, "t3": 2, "t162": 2, "560": 2, "t169": 2, "658": 2, "neurop": 2, "t101": 2, "328": 2, "t7": 2, "294": 2, "t95": 2, "118": 2, "ngreater": 2, "t66": 2, "952": 2, "t72": 2, "559": 2, "t74": 2, "200": 2, "njapan": 2, "t25": 2, "052": 2, "t24": 2, "257": 2, "977": 2, "nrest": 2, "t30": 2, "t4": 2, "t29": 2, "615": 2, "t1": 2, "t391": 2, "035": 2, "t2": 2, "t383": 2, "285": 2, "t394": 2, "decreas": 2, "weak": 2, "renminbi": 2, "yen": [2, 4], "22": 2, "categori": 2, "t201": 2, "183": 2, "t200": 2, "583": 2, "t205": 2, "489": 2, "984": 2, "357": 2, "t40": 2, "177": 2, "t26": 2, "694": 2, "t28": 2, "300": [2, 3], "292": 2, "t37": 2, "005": 2, "t39": 2, "845": 2, "t41": 2, "241": 2, "n96": 2, "169": 2, "t13": 2, "t85": 2, "t9": 2, "t78": 2, "129": 2, "amort": 2, "bundl": 2, "flat": 2, "entri": 2, "partial": [2, 3], "ngross": 2, "percentag": 2, "t109": 2, "633": 2, "t108": 2, "803": 2, "t114": 2, "728": 2, "t71": 2, "t60": 2, "345": 2, "t56": 2, "054": 2, "t180": 2, "683": 2, "148": 2, "t170": 2, "782": 2, "t36": 2, "t73": 2, "t70": 2, "t46": 2, "t44": 2, "t43": 2, "save": [2, 3], "noper": 2, "t31": 2, "370": 2, "t5": 2, "915": 2, "t14": 2, "251": 2, "npercentag": 2, "t8": 2, "nsell": 2, "administr": 2, "097": 2, "932": 2, "094": 2, "t6": 2, "t57": 2, "467": 2, "t54": 2, "847": 2, "t51": 2, "t15": 2, "driven": 2, "headcount": 2, "nprovis": 2, "749": 2, "t16": 2, "741": 2, "t19": 2, "neffect": 2, "nstatutori": 2, "t21": 2, "aid": 2, "nliquid": 2, "unrestrict": 2, "140": 2, "ndebt": 2, "97": 2, "payabl": 2, "promissori": 2, "paper": [2, 4], "nleas": 2, "space": 2, "nmanufactur": 2, "noncancel": 2, "ndeem": 2, "2017": 2, "tcja": 2, "paid": 2, "nstate": 2, "fund": 2, "escrow": 2, "ncapit": 2, "95": 2, "nrecent": 2, "pronounc": 2, "nincom": 2, "decemb": 2, "fasb": 2, "asu": 2, "09": [2, 3], "topic": [2, 3, 4], "740": 2, "reconcili": 2, "reconcil": [2, 4], "quantit": 2, "threshold": 2, "disaggreg": 2, "prospect": 2, "novemb": 2, "07": [2, 3, 4], "280": 2, "maker": 2, "codm": 2, "titl": 2, "alloc": 2, "retrospect": 2, "ncritic": 2, "conform": [2, 4], "principl": 2, "gaap": 2, "nuncertain": 2, "domest": 2, "taxat": 2, "adjust": [2, 3, 4], "resolut": 2, "conting": 2, "26": 2, "still": 2, "ninterest": 2, "forth": 2, "hypothet": 2, "nsensit": 2, "nhypothet": 2, "nrate": 2, "npotenti": 2, "n100": 2, "tenor": 2, "ndeclin": 2, "755": 2, "089": 2, "nterm": 2, "nincreas": 2, "t139": 2, "t194": 2, "nforeign": 2, "express": [2, 4], "var": 2, "mont": 2, "carlo": 2, "simul": [2, 4], "maximum": [2, 3], "interv": 2, "538": 2, "669": 2, "underli": [2, 4], "nindex": 2, "tpage": 2, "nconsolid": 2, "n29": 2, "n30": 2, "sheet": 2, "n31": 2, "n32": 2, "n33": 2, "nnote": 2, "n34": 2, "nreport": 2, "n48": 2, "nall": 2, "omit": [2, 4], "submiss": 2, "nyear": 2, "n2023": 2, "n2022": 2, "nnet": 2, "t294": 2, "866": 2, "t298": 2, "085": 2, "t316": 2, "199": 2, "t96": 2, "ncost": 2, "t185": 2, "233": 2, "t189": 2, "282": 2, "471": 2, "119": 2, "855": 2, "t22": 2, "075": 2, "352": 2, "t214": 2, "137": 2, "t223": 2, "546": 2, "t123": 2, "216": 2, "t119": 2, "437": 2, "t269": 2, "565": 2, "334": 2, "485": 2, "736": 2, "103": 2, "t93": 2, "995": 2, "t99": 2, "nearn": 2, "nbasic": 2, "ndilut": 2, "08": [2, 4], "343": 2, "783": 2, "744": 2, "231": 2, "215": 2, "963": 2, "095": 2, "812": 2, "547": 2, "325": 2, "819": 2, "nsee": 2, "translat": 2, "t395": 2, "765": 2, "511": 2, "unreal": 2, "832": 2, "t323": 2, "212": 2, "nadjust": 2, "337": 2, "717": 2, "394": 2, "138": 2, "850": 2, "563": 2, "104": 2, "t204": 2, "t253": 2, "816": 2, "899": 2, "272": 2, "t98": 2, "016": 2, "652": 2, "t88": 2, "531": 2, "nasset": 2, "ncurrent": 2, "ncash": 2, "943": 2, "965": 2, "228": 2, "590": 2, "naccount": 2, "410": 2, "508": 2, "nvendor": 2, "t32": 2, "833": 2, "477": 2, "ninventori": 2, "286": 2, "331": 2, "287": 2, "695": 2, "t152": 2, "987": 2, "t143": 2, "566": 2, "t91": 2, "479": 2, "544": 2, "t45": 2, "680": 2, "715": 2, "834": 2, "t64": 2, "758": 2, "t211": 2, "993": 2, "t209": 2, "017": 2, "t364": 2, "980": 2, "t352": 2, "nliabil": 2, "t68": 2, "960": 2, "t62": 2, "611": 2, "304": 2, "t58": 2, "829": 2, "ndefer": 2, "249": 2, "061": 2, "ncommerci": 2, "967": 2, "985": 2, "t10": 2, "912": 2, "822": 2, "t176": 2, "392": 2, "t145": 2, "308": 2, "750": 2, "281": 2, "888": 2, "t49": 2, "848": 2, "638": 2, "t308": 2, "030": 2, "t290": 2, "ncommit": 2, "nsharehold": 2, "400": 2, "116": 2, "786": 2, "550": 2, "n83": 2, "276": 2, "naccumul": 2, "deficit": 2, "154": 2, "214": 2, "172": 2, "452": 2, "950": 2, "146": 2, "t50": 2, "672": 2, "t63": 2, "090": 2, "nbegin": 2, "849": 2, "365": 2, "423": 2, "346": 2, "175": 2, "withheld": 2, "settlement": 2, "award": 2, "521": 2, "971": 2, "t12": 2, "034": 2, "t11": 2, "nend": 2, "t83": 2, "nretain": 2, "068": 2, "562": 2, "ndividend": 2, "218": 2, "793": 2, "612": 2, "099": 2, "454": 2, "846": 2, "77": 2, "046": 2, "186": 2, "109": 2, "t163": 2, "rsu": 2, "t0": 2, "98": 2, "94": 2, "32": 2, "737": 2, "929": 2, "ndepreci": 2, "445": 2, "519": 2, "688": 2, "038": 2, "266": 2, "227": 2, "006": 2, "788": 2, "356": 2, "271": 2, "520": 2, "618": 2, "484": 2, "731": 2, "684": 2, "499": 2, "020": 2, "889": 2, "448": 2, "552": 2, "031": 2, "t118": 2, "254": 2, "t110": 2, "543": 2, "t122": 2, "151": 2, "48": 2, "656": 2, "513": 2, "76": 2, "923": 2, "nproce": 2, "211": 2, "686": 2, "917": 2, "135": 2, "828": 2, "446": 2, "447": 2, "959": 2, "708": 2, "086": 2, "935": 2, "705": 2, "354": 2, "nfinanc": 2, "441": 2, "431": 2, "223": 2, "234": 2, "025": 2, "841": 2, "nrepurchas": 2, "949": 2, "89": 2, "402": 2, "465": 2, "nrepay": 2, "958": 2, "repay": 2, "978": 2, "955": 2, "361": 2, "581": 2, "160": 2, "121": 2, "983": 2, "108": 2, "488": 2, "794": 2, "760": 2, "nsupplement": 2, "102": 2, "t18": 2, "679": 2, "573": 2, "33": 2, "nbasi": 2, "prior": 2, "reclassifi": 2, "nrevenu": 2, "remit": 2, "straight": 2, "vest": 2, "treat": 2, "sold": 2, "nderiv": 2, "combin": [2, 3, 4], "nonleas": 2, "34": 2, "entitl": 2, "reward": 2, "commenc": 2, "deliveri": 2, "stand": 2, "alon": 2, "ssp": 2, "object": [2, 4], "icloud": 2, "siri": 2, "map": [2, 4], "discount": 2, "lack": [2, 4], "undeliv": 2, "unbil": 2, "accordingli": 2, "n26": 2, "n37": 2, "35": 2, "proport": 2, "moder": 2, "64": 2, "dilut": 2, "nnumer": 2, "ndenomin": 2, "nweight": 2, "312": 2, "316": 2, "856": 2, "antidilut": 2, "tunreal": 2, "ngain": 2, "tfair": 2, "nvalu": 2, "tcash": 2, "nequival": 2, "tcurrent": 2, "tnon": 2, "t27": 2, "nlevel": 2, "nmonei": 2, "t778": 2, "nmutual": 2, "n515": 2, "t105": 2, "t617": 2, "nsubtot": 2, "293": 2, "395": 2, "nu": 2, "treasuri": 2, "516": 2, "t212": 2, "087": 2, "380": 2, "agenc": 2, "159": 2, "t703": 2, "t17": 2, "568": 2, "158": 2, "810": 2, "ncertif": 2, "deposit": 2, "t873": 2, "t387": 2, "t478": 2, "066": 2, "ncorpor": 2, "t65": 2, "622": 2, "t270": 2, "953": 2, "939": 2, "027": 2, "t47": 2, "886": 2, "nmunicip": 2, "t412": 2, "t405": 2, "t190": 2, "nmortgag": 2, "595": 2, "t175": 2, "403": 2, "t23": 2, "367": 2, "278": 2, "t132": 2, "t583": 2, "635": 2, "t128": 2, "056": 2, "966": 2, "t34": 2, "t160": 2, "t688": 2, "650": 2, "36": 2, "359": 2, "t481": 2, "n442": 2, "t428": 2, "t923": 2, "t909": 2, "406": 2, "114": 2, "468": 2, "136": 2, "t271": 2, "533": 2, "048": 2, "491": 2, "332": 2, "t320": 2, "t608": 2, "t76": 2, "840": 2, "956": 2, "890": 2, "t20": 2, "627": 2, "243": 2, "t628": 2, "t602": 2, "t192": 2, "t410": 2, "735": 2, "636": 2, "t344": 2, "t144": 2, "470": 2, "657": 2, "831": 2, "125": 2, "162": 2, "t173": 2, "752": 2, "quot": 2, "corrobor": 2, "mortgag": 2, "classifi": 2, "37": 2, "cross": 2, "swap": 2, "remeasur": 2, "notion": 2, "069": 2, "730": 2, "575": 2, "493": 2, "t104": 2, "777": 2, "nhedg": 2, "433": 2, "505": 2, "247": 2, "ntrade": 2, "41": 2, "44": 2, "depreci": 2, "nland": 2, "690": 2, "nmachineri": 2, "t80": 2, "205": 2, "314": 2, "nleasehold": 2, "839": 2, "128": 2, "599": 2, "73": 2, "70": 2, "884": 2, "852": 2, "t55": 2, "335": 2, "906": 2, "601": 2, "703": 2, "010": 2, "457": 2, "634": 2, "391": 2, "neuropean": 2, "opinion": 2, "1991": 2, "2007": 2, "irish": 2, "branch": 2, "2003": 2, "2014": 2, "2015": 2, "request": [2, 3, 4], "minist": 2, "juli": 2, "annul": 2, "ecj": 2, "hear": 2, "asid": 2, "confirm": 2, "via": [2, 4], "unrecogn": 2, "nfeder": 2, "571": 2, "080": 2, "644": 2, "265": 2, "801": 2, "726": 2, "570": 2, "298": 2, "49": 2, "t84": 2, "428": 2, "603": 2, "483": 2, "t347": 2, "t669": 2, "076": 2, "830": 2, "419": 2, "072": 2, "pretax": 2, "72": 2, "71": 2, "ncomput": 2, "885": 2, "012": 2, "124": 2, "518": 2, "nimpact": 2, "n10": 2, "246": 2, "311": 2, "366": 2, "397": 2, "153": 2, "nexcess": 2, "893": 2, "871": 2, "192": 2, "739": 2, "ntax": 2, "carryforward": 2, "302": 2, "naccru": 2, "413": 2, "421": 2, "nunreal": 2, "173": 2, "168": 2, "873": 2, "743": 2, "nless": 2, "374": 2, "007": 2, "369": 2, "551": 2, "998": 2, "nright": 2, "179": 2, "nminimum": 2, "674": 2, "940": 2, "t511": 2, "t455": 2, "t490": 2, "805": 2, "202": 2, "indefinit": 2, "temporari": 2, "727": 2, "044": 2, "284": 2, "ndecreas": 2, "386": 2, "463": 2, "982": 2, "542": 2, "936": 2, "070": 2, "expir": 2, "statut": 2, "229": 2, "494": 2, "closur": 2, "intercompani": 2, "exceed": 2, "multiyear": 2, "exercis": 2, "noncash": 2, "rou": 2, "tfinanci": 2, "t2024": 2, "tother": 2, "661": 2, "tproperti": 2, "015": 2, "303": 2, "676": 2, "t165": 2, "t752": 2, "t859": 2, "430": 2, "842": 2, "tfinanc": 2, "n2025": 2, "820": 2, "t171": 2, "991": 2, "n2026": 2, "914": 2, "n2027": 2, "t59": 2, "733": 2, "n2028": 2, "360": 2, "t38": 2, "398": 2, "n2029": 2, "187": 2, "nthereaft": 2, "t837": 2, "undiscount": 2, "790": 2, "imput": 2, "376": 2, "534": 2, "t896": 2, "weight": 2, "borrow": 2, "implicit": 2, "readili": 2, "42": 2, "proce": 2, "nine": 2, "00": 2, "nmatur": 2, "333": 2, "264": 2, "948": 2, "645": 2, "309": 2, "arrear": 2, "namount": 2, "n2013": 2, "nfix": 2, "2062": 2, "t97": 2, "341": 2, "03": 2, "65": 2, "t106": 2, "572": 2, "n97": 2, "nunamort": 2, "premium": 2, "321": 2, "358": 2, "113": 2, "662": 2, "convert": [2, 4], "930": 2, "342": 2, "800": 2, "180": 2, "43": 2, "88": 2, "ndure": 2, "425": 2, "426": 2, "372": 2, "589": 2, "055": 2, "appreci": 2, "four": 2, "holder": 2, "n2014": 2, "bonu": 2, "nrestrict": 2, "nnumber": 2, "nrsu": 2, "ngrant": 2, "naggreg": 2, "nfair": 2, "nbalanc": 2, "t240": 2, "427": 2, "t75": 2, "t150": 2, "861": 2, "501": 2, "768": 2, "87": 2, "101": 2, "878": 2, "144": 2, "t127": 2, "t135": 2, "91": 2, "456": 2, "78": 2, "59": 2, "t140": 2, "80": 2, "326": 2, "t158": 2, "204": 2, "350": 2, "002": [2, 3], "nuncondit": 2, "uncondit": 2, "206": 2, "440": 2, "156": 2, "t633": 2, "t670": 2, "226": 2, "45": 2, "nconting": 2, "least": 2, "accrual": 2, "nconcentr": 2, "attribut": [2, 4], "46": 2, "t67": 2, "098": 2, "082": 2, "062": 2, "569": 2, "895": 2, "458": 2, "207": 2, "nonrecur": 2, "t142": 2, "196": 2, "t138": 2, "t147": 2, "859": 2, "nchina": 2, "n66": 2, "t181": 2, "887": 2, "t172": 2, "269": 2, "nlong": 2, "664": 2, "n4": 2, "797": 2, "778": 2, "219": 2, "47": 2, "nopinion": 2, "nwe": 2, "fairli": 2, "pcaob": 2, "criteria": 2, "sponsor": 2, "treadwai": 2, "2013": 2, "unqualifi": 2, "thereon": 2, "nthese": 2, "misstat": 2, "fraud": 2, "alter": 2, "ndescript": 2, "naudit": 2, "nhow": 2, "nmatter": 2, "qualifi": 2, "letter": 2, "advisor": 2, "ernst": 2, "young": 2, "llp": 2, "auditor": 2, "2009": 2, "nsan": 2, "jose": 2, "nnovemb": 2, "coso": 2, "nour": 2, "ndefinit": 2, "pertain": 2, "mainten": 2, "accur": [2, 4], "disposit": 2, "receipt": 2, "degre": 2, "nevalu": 2, "nbase": 2, "supervis": 2, "13a": 2, "15d": 2, "summar": [2, 3], "ninher": 2, "met": 2, "appear": [2, 4], "paragraph": 2, "51": [2, 4], "ninsid": 2, "deirdr": 2, "brien": 2, "vice": 2, "presid": 2, "affirm": 2, "april": 2, "withhold": 2, "remitt": 2, "jeff": 2, "william": 2, "mr": 2, "insid": 2, "copi": [2, 3], "exhibit": 2, "solicit": 2, "document": [2, 3, 4], "id": 2, "00042": 2, "nincorpor": 2, "texhibit": 2, "descript": [2, 4], "tform": 2, "tfile": 2, "nrestat": 2, "n8": 2, "namend": 2, "bylaw": 2, "nindentur": 2, "york": [2, 4], "mellon": 2, "truste": 2, "noffic": 2, "certif": 2, "2018": 2, "85": 2, "2043": 2, "05": 2, "2044": 2, "februari": 2, "55": 2, "2045": 2, "900": 2, "700": 2, "60": 2, "250": 2, "2036": 2, "2046": 2, "450": 2, "2047": 2, "2049": 2, "2030": 2, "2050": 2, "2060": 2, "2028": 2, "2041": 2, "2051": 2, "2061": 2, "2032": 2, "2052": 2, "54": 2, "2033": 2, "2053": 2, "n9": 2, "ceo": 2, "n12": 2, "nsubsidiari": 2, "n23": 2, "nconsent": 2, "n24": 2, "npower": 2, "signatur": 2, "nrule": 2, "nsection": 2, "1350": 2, "n101": 2, "ninlin": 2, "xbrl": 2, "n104": 2, "inlin": 2, "compensatori": 2, "herewith": 2, "furnish": 2, "herebi": 2, "undertak": 2, "56": 2, "nsignatur": 2, "npursuant": 2, "duli": 2, "sign": 2, "undersign": 2, "thereunto": 2, "ndate": 2, "nby": 2, "luca": [2, 4], "maestri": 2, "nluca": 2, "nsenior": 2, "nchief": 2, "nknow": 2, "THESE": 2, "whose": 2, "constitut": 2, "appoint": 2, "timothi": 2, "cook": 2, "jointli": 2, "hi": [2, 4], "her": 2, "substitut": 2, "him": 2, "thereto": 2, "therewith": 2, "ratifi": 2, "said": 2, "done": [2, 4], "virtu": 2, "hereof": 2, "nname": 2, "ttitl": 2, "tdate": 2, "tchief": 2, "tnovemb": 2, "ntimothi": 2, "tsenior": 2, "chri": 2, "kondo": 2, "nchri": 2, "wanda": 2, "austin": 2, "nwanda": 2, "alex": 2, "gorski": 2, "tdirector": 2, "nalex": 2, "andrea": 2, "jung": 2, "nandrea": 2, "arthur": 2, "levinson": 2, "narthur": 2, "monica": 2, "lozano": 2, "nmonica": 2, "ronald": 2, "sugar": 2, "nronald": 2, "susan": 2, "l": 2, "wagner": 2, "nsusan": 2, "57": 2, "gpt": [2, 3, 4], "turbo": [2, 3, 4], "invdestacksmeticsisdict": 2, "setispect": 2, "20cyan": 2, "evaluationseld": 2, "anvis": 2, "droitent": 2, "discernminerv": 2, "versbobprefvers": 2, "vo\u8be5": 2, "option\u548c": 2, "meio": 2, "\u0432\u0440\u0435\u043ccisco": 2, "dellaischenpoihscap": 2, "geme": 2, "gettim": 2, "unscal": 2, "score": [2, 4], "vocabulari": [2, 4], "closer": 2, "sharpen": 2, "uniform": 2, "raschka": 2, "simpl": [2, 3, 4], "dramat": [2, 4], "systemat": [2, 4], "At": [2, 4], "rigid": 2, "wildli": 2, "radic": 2, "grappl": 2, "probabilist": 2, "seem": [2, 4], "safer": 2, "don": [2, 3, 4], "highlight": [2, 3, 4], "paradigm": 2, "anoth": 2, "fascin": 2, "spontan": 2, "answer": [2, 3, 4], "aren": 2, "explicitli": 2, "clear": [2, 4], "wei": 2, "fig": [2, 3, 4], "linear": 2, "absent": 2, "simpli": [2, 3, 4], "coax": 2, "onc": [2, 3], "reach": [2, 3, 4], "journei": 2, "suddenli": 2, "manifest": 2, "call": [2, 3, 4], "phase": 2, "stark": 2, "deliber": 2, "convent": 2, "stabl": 2, "suit": 2, "contend": 2, "7b": 2, "70b": 2, "rethink": 2, "math": 2, "tutor": 2, "children": 2, "verifi": [2, 4], "just": [2, 3, 4], "predefin": [2, 4], "adapt": [2, 3], "explan": [2, 4], "child": 2, "ag": 2, "bound": 2, "weren": 2, "accuraci": [2, 4], "kind": 2, "dimens": 2, "pre": 2, "explicit": [2, 4], "usual": 2, "precis": [2, 4], "resist": 2, "straightforward": [2, 3, 4], "quantif": 2, "contamin": 2, "carefulli": [2, 4], "craft": [2, 4], "massiv": 2, "alreadi": 2, "seen": 2, "memor": 2, "truli": 2, "unseen": 2, "rigor": 2, "evolut": 2, "longitudin": 2, "autom": [2, 4], "annot": 2, "mostli": [2, 4], "versu": 2, "latter": 2, "foundat": [2, 3], "tailor": 2, "solv": [2, 4], "great": [2, 4], "why": [2, 4], "misinform": 2, "factual": 2, "databas": [2, 4], "citat": 2, "tempor": 2, "scientif": 2, "fals": [2, 4], "manipul": 2, "medic": 2, "disclaim": 2, "referr": 2, "boundari": 2, "situat": [2, 3], "incorrect": 2, "expertis": 2, "bia": [2, 4], "gender": 2, "racial": 2, "demograph": 2, "stereotyp": 2, "reinforc": 2, "societ": 2, "pii": 2, "anonym": 2, "leakag": 2, "carryov": 2, "protocol": 2, "cognit": 2, "multi": [2, 4], "mathemat": 2, "fallaci": 2, "causal": 2, "edg": 2, "think": 2, "idiom": 2, "sarcasm": 2, "terminologi": 2, "lingual": 2, "misunderstand": 2, "syntax": 2, "scan": 2, "compat": [2, 4], "stabil": 2, "effici": [2, 3, 4], "scalabl": [2, 3], "meta": [2, 3], "overconfid": 2, "clariti": [2, 3, 4], "audienc": 2, "densiti": 2, "satisfact": [2, 4], "misus": 2, "moral": 2, "transpar": [2, 4], "co2": 2, "energi": 2, "consumpt": 2, "server": [2, 4], "batch": 2, "infer": 2, "imag": 2, "audio": 2, "etc": [2, 4], "truth": [2, 4], "layer": [2, 3, 4], "palm": 2, "shown": 2, "quantifi": 2, "rank": 2, "easi": [2, 3], "synthet": [2, 4], "post": [2, 4], "timeout": 2, "variat": 2, "maxim": 2, "inter": 2, "rater": 2, "priorit": 2, "ti": 2, "tier": 2, "holist": 2, "built": [2, 4], "mind": 2, "x": 2, "fast": 2, "experiment": [2, 4], "iter": [2, 3, 4], "vi": 2, "later": [2, 4], "categor": [2, 4], "intrins": 2, "extrins": 2, "sequenc": [2, 4], "perplex": 2, "downstream": [2, 4], "valuabl": [2, 4], "distinguish": 2, "classif": [2, 4], "true": [2, 3, 4], "synthesi": 2, "discret": 2, "f1": 2, "match": [2, 4], "prefix": 2, "roug": 2, "bleu": 2, "charact": [2, 3, 4], "gram": 2, "bilingu": 2, "understudi": 2, "overlap": [2, 3], "favor": [2, 4], "breviti": 2, "insensit": 2, "semant": [2, 3], "orient": 2, "gist": 2, "sentenc": [2, 3, 4], "ignor": 2, "meteor": 2, "synonym": 2, "stem": [2, 4], "paraphras": 2, "alongsid": 2, "computation": [2, 3], "cider": 2, "consensu": 2, "tf": 2, "idf": 2, "caption": 2, "reliant": 2, "corpu": 2, "statist": 2, "ter": 2, "edit": 2, "hypothesi": 2, "penal": 2, "bertscor": 2, "embed": [2, 3], "bert": 2, "spice": 2, "proposit": 2, "scene": 2, "emphasi": 2, "pure": 2, "analyst": [2, 3], "dictionari": [2, 4], "rouge_1": 2, "rouge_2": 2, "ideal": [2, 4], "expert": [2, 3, 4], "cheaper": 2, "4o": [2, 3, 4], "evaluate_summari": 2, "unigram": 2, "bigram": 2, "huggingfac": 2, "librari": [2, 3, 4], "absl": 2, "py": 2, "rouge_scor": 2, "generated_summari": 2, "reference_summari": 2, "arg": [2, 3, 4], "dict": [2, 3, 4], "google_bleu": 2, "bleu_scor": 2, "rouge1": 2, "rouge2": 2, "arbitrari": 2, "chosen": 2, "sentence1": 2, "cat": 2, "sat": 2, "mat": 2, "sentence2": 2, "ate": 2, "3333333333333333": 2, "7272727272727272": 2, "4444444444444445": 2, "generate_summari": 2, "summir": 2, "correspond": [2, 4], "liner": 2, "excerpt": 2, "evaluate_summary_model": 2, "model_benchmark": 2, "models_test": 2, "benchmark_summari": 2, "model_summari": 2, "evaluation_result": 2, "reveal": 2, "analyz": [2, 3, 4], "statu": 2, "concis": 2, "element": [2, 4], "Its": 2, "verbos": 2, "peripher": 2, "quit": [2, 4], "overli": [2, 4], "simplifi": [2, 4], "miss": 2, "convei": [2, 3], "breadth": 2, "Of": 2, "vibe": 2, "visualize_prompt_comparison": 2, "visual": 2, "matplotlib": 2, "radar": 2, "plot": 2, "radar_plot": 2, "tmp": 2, "ipykernel_1652501": 2, "940173201": 2, "userwarn": 2, "figurecanvasagg": 2, "closest": 2, "largest": 2, "deviat": [2, 4], "suggest": [2, 4], "mention": [2, 4], "nuanc": [2, 3, 4], "granular": [2, 3], "fall": 2, "judg": 2, "themselv": 2, "main": [2, 3, 4], "instruct": [2, 3, 4], "tune": [2, 4], "assign": [2, 4], "likert": 2, "style": 2, "pairwis": 2, "ensembl": 2, "repeatedli": 2, "domain": 2, "fluenci": 2, "refin": 2, "excel": [2, 4], "narr": 2, "mirror": 2, "similarli": 2, "notabl": [2, 4], "properli": [2, 4], "henc": 2, "worth": 2, "integ": 2, "rubric": 2, "hollist": 2, "judgeevalu": 2, "grammar": [2, 4], "evaluate_with_llm": 2, "candid": 2, "pars": [2, 4], "criterion": 2, "basemodel": [2, 4], "judge_model": 2, "candidate_summari": 2, "written": 2, "grammat": 2, "y": [2, 4], "z": 2, "w": [2, 3], "beta": [2, 4], "response_format": [2, 4], "Then": 2, "benchmark_model": 2, "test_model": 2, "input_text": [2, 3], "tupl": 2, "trillion": [2, 4], "evals_list": 2, "1775618912": 2, "variant": 2, "slightli": 2, "drift": 2, "lowest": 2, "drop": 2, "gradient": 2, "visibl": 2, "degrad": [2, 4], "firstli": 2, "overhead": 2, "neglect": 2, "prefer": [2, 4], "egocentr": 2, "tight": 2, "field": [2, 4], "aproach": 2, "workflow": [2, 4], "assessor": 2, "aplic": 2, "aim": [2, 3, 4], "clearli": [2, 4], "earlier": 2, "depict": [2, 4], "correl": 2, "multilingu": 2, "golden": 2, "languang": 2, "arena": 2, "blind": 2, "randomli": 2, "pair": 2, "loop": 2, "customiz": 2, "irrelev": 2, "unhelp": 2, "though": [2, 4], "occasion": 2, "rare": 2, "inaccuraci": 2, "perfectli": 2, "cater": 2, "critiqu": 2, "elo": 2, "democrat": [2, 4], "thought": [2, 4], "exam": 2, "probe": 2, "certifi": 2, "histori": 2, "move": [2, 3], "began": 2, "glue": 2, "wang": 2, "entail": 2, "baselin": 2, "superglu": 2, "deeper": [2, 3], "successor": 2, "grew": 2, "big": 2, "bench": 2, "srivastava": 2, "arithmet": 2, "truthfulqa": 2, "lin": [2, 4], "decept": 2, "multitask": 2, "hendryck": 2, "multidisciplinari": 2, "stanford": 2, "helm": 2, "liang": 2, "multidimension": 2, "surround": [2, 4], "emphas": [2, 4], "humanev": 2, "chen": [2, 4], "lmsy": 2, "brought": 2, "dialogu": 2, "len": [2, 3], "replic": [2, 4], "chatbot": 2, "chiang": 2, "gather": 2, "alpacaev": 2, "duboi": 2, "mt": 2, "zheng": 2, "Their": [2, 4], "render": 2, "crowdsourc": 2, "livebench": 2, "white": 2, "resili": 2, "meaningfulli": 2, "monthli": 2, "zebralog": 2, "grid": 2, "puzzl": 2, "brailsford": 2, "1999": 2, "lsat": 2, "hous": 2, "clue": 2, "strateg": [2, 4], "deduct": 2, "arriv": 2, "programmat": [2, 4], "2x2": 2, "6x6": 2, "reductio": 2, "ad": [2, 4], "absurdum": 2, "sonnet": [2, 3], "hard": 2, "10b": 2, "counterfactu": 2, "composit": 2, "came": 2, "arc": 2, "prize": 2, "chollet": 2, "mike": 2, "knoop": 2, "founder": 2, "zapier": 2, "fran\u00e7oi": 2, "creator": 2, "agi": 2, "kera": 2, "meaning": [2, 3, 4], "genuin": 2, "old": 2, "possess": 2, "count": [2, 3], "elementari": 2, "novelti": 2, "someth": 2, "wouldn": 2, "interpol": 2, "memori": [2, 3], "synthes": 2, "fly": 2, "brute": 2, "minim": [2, 4], "pixel": 2, "perfect": 2, "color": 2, "unbeaten": 2, "win": 2, "deep": [2, 4], "poorli": 2, "recombin": 2, "spur": 2, "art": 2, "takeawai": 2, "algorithm": 2, "fourrier": 2, "lightweight": [2, 4], "bespok": 2, "sdk": 2, "cli": 2, "extract": [2, 3, 4], "autoregress": 2, "sub": 2, "liter": 2, "disturb": 2, "zero": [2, 4], "varianc": 2, "yt": 2, "ut": 2, "suppos": [2, 4], "exactli": [2, 4], "ol": 2, "heteroscedast": 2, "regress": 2, "wish": 2, "lag": 2, "bivari": 2, "evaluation_track": 2, "evaluationtrack": 2, "model_config": 2, "basemodelconfig": 2, "parallelismmanag": 2, "pipelineparamet": 2, "envconfig": 2, "is_accelerate_avail": 2, "datetim": 2, "timedelta": 2, "initprocessgroupkwarg": 2, "create_evaluation_pipelin": 2, "output_dir": 2, "cache_dir": 2, "pretrain": 2, "dtype": 2, "float16": 2, "max_sampl": 2, "kwargs_handl": 2, "3000": 2, "els": [2, 3], "save_detail": 2, "push_to_hub": 2, "pipeline_param": 2, "launcher_typ": 2, "env_config": 2, "override_batch_s": 2, "use_chat_templ": 2, "trust_remote_cod": 2, "pipeline_paramet": 2, "schemat": [2, 3], "vllm": [2, 4], "tgi": 2, "instanti": 2, "storag": 2, "push": 2, "hub": 2, "parallel": 2, "num_few_shot": 2, "automat": 2, "string": [2, 4], "vertic": 2, "bar": 2, "binari": 2, "flag": 2, "bigbench": 2, "winogrand": 2, "hellaswag": 2, "nlp": 2, "save_and_push_result": 2, "show_result": 2, "model_arg": 2, "remot": 2, "send": [2, 4], "serverless": 2, "inference_server_address": 2, "inference_server_auth": 2, "model_id": 2, "null": 2, "bash": 2, "command": 2, "model_config_path": 2, "path": [2, 3], "endpoint_model": 2, "yaml": [2, 4], "llama3": [2, 3], "qwen2": [2, 4], "smollm2": 2, "3b": 2, "alibaba": [2, 4], "5b": [2, 4], "hui": 2, "yang": 2, "compact": 2, "360m": 2, "allal": 2, "cluster": 2, "noteworthi": 2, "superior": 2, "grain": [2, 4], "salt": [2, 4], "give": 2, "exponenti": 2, "hug": 2, "modular": 2, "visit": 2, "offici": 2, "revisit": 2, "rememb": 2, "api_kei": [2, 3], "trace": 2, "langchain_tracing_v2": 2, "langchain_api_kei": 2, "hf_evalu": 2, "langsmith_evalu": 2, "ls_client": 2, "tobia": 2, "src": 2, "lib": 2, "python3": 2, "tqdm": 2, "auto": 2, "tqdmwarn": 2, "iprogress": 2, "pleas": 2, "jupyt": 2, "ipywidget": 2, "readthedoc": 2, "en": [2, 4], "user_instal": 2, "html": [2, 3, 4], "autonotebook": 2, "notebook_tqdm": 2, "dataset_nam": 2, "create_dataset": 2, "create_exampl": 2, "dataset_id": 2, "calculate_scor": 2, "reference_output": 2, "oai_client": 2, "xp_model_nam": 2, "lastli": 2, "run_evalu": 2, "upload": 2, "And": 2, "upload_result": 2, "experiment_prefix": 2, "num_repetit": 2, "view": 2, "386a3620": 2, "smith": 2, "9e1cc3cb": 2, "9d6a": 2, "4356": 2, "ab34": 2, "138e0abe8be4": 2, "8741976e": 2, "5268": 2, "4b75": 2, "949f": 2, "99477dde5d64": 2, "selectedsess": 2, "b831dc1e": 2, "90bc": 2, "4ed8": 2, "8080": 2, "fb42444724d6": 2, "4it": 2, "latest": [2, 3, 4], "modul": [2, 4], "evaluate_modul": 2, "6fc70b7be0088120a372dfdd5d320b39b8bb3630cb8029b193941d9376e86bb0": 2, "tue": 2, "nov": 2, "couldn": 2, "5it": 2, "5053784e": 2, "64445871": 2, "a53c": 2, "44b1": 2, "a422": 2, "4f49b2f9656f": 2, "69": 2, "4b29f3c9": 2, "9ef7e39a": 2, "2add": 2, "410c": 2, "89f8": 2, "9f1a8b198cf1": 2, "61": 2, "df": 2, "to_panda": 2, "insert": 2, "combined_df": 2, "concat": 2, "ignore_index": 2, "execution_tim": 2, "example_id": 2, "333333": 2, "224388": 2, "feb10f92": 2, "3167": 2, "41f3": 2, "bb1c": 2, "d271153a31a8": 2, "5b196b22": 2, "9f4c": 2, "489c": 2, "b020": 2, "7823208b42d6": 2, "348101": 2, "722464": 2, "c310f159": 2, "064a": 2, "4035": 2, "97c3": 2, "a25bbf43abc2": 2, "386076": 2, "704104": 2, "f7f24899": 2, "dd50": 2, "409e": 2, "93cc": 2, "6fb1622b60bf": 2, "443038": 2, "725059": 2, "242856d6": 2, "efb5": 2, "4101": 2, "b1cf": 2, "5805532838ac": 2, "373418": 2, "795302": 2, "ce975169": 2, "a0ab": 2, "40ce": 2, "8e32": 2, "efa28d06079d": 2, "stat": 2, "groupbi": 2, "agg": 2, "std": 2, "round": 2, "sort": 2, "sort_valu": 2, "figur": [2, 4], "subplot": 2, "side": 2, "pyplot": 2, "plt": 2, "numpi": 2, "np": 2, "ax1": 2, "ax2": 2, "figsiz": 2, "2ecc71": 2, "3498db": 2, "e74c3c": 2, "bleu_mean": 2, "bleu_std": 2, "enumer": [2, 3], "errorbar": 2, "yerr": 2, "fmt": 2, "markers": 2, "capsiz": 2, "label": [2, 4], "alpha": [2, 4], "set_ylabel": 2, "set_titl": 2, "set_xtick": 2, "set_xticklabel": 2, "rotat": 2, "set_ylim": 2, "bottom": 2, "axi": 2, "legend": 2, "exec_mean": 2, "exec_std": 2, "tight_layout": 2, "ndetail": 2, "4038": 2, "0453": 2, "7815": 2, "0433": 2, "3768": 2, "0424": 2, "8343": 2, "2208": 2, "3519": 2, "0775": 2, "9122": 2, "1482": 2, "377": 2, "042": 2, "83": 2, "078": 2, "slower": 2, "fastest": 2, "04": [2, 3], "latenc": [2, 3], "speed": 2, "interestingli": 2, "longer": 2, "alb": 2, "loubna": 2, "ben": 2, "anton": 2, "lozhkov": 2, "eli": 2, "bakouch": 2, "gabriel": 2, "mart\u00edn": 2, "bl\u00e1zquez": 2, "lewi": 2, "tunstal": 2, "agust\u00edn": 2, "piquer": 2, "andr": 2, "marafioti": 2, "cyril": 2, "zakka": 2, "leandro": 2, "von": 2, "werra": 2, "thoma": 2, "wolf": 2, "are24": 2, "judgearena": 2, "bps99": 2, "salli": 2, "pott": 2, "barbara": 2, "journal": [2, 4], "557": 2, "sciencedirect": 2, "s0377221798003646": 2, "doi": [2, 4], "org": [2, 4], "1016": 2, "s0377": 2, "2217": 2, "00364": 2, "ctj": 2, "jerri": 2, "tworek": 2, "heewoo": 2, "jun": 2, "qime": 2, "yuan": 2, "henriqu": 2, "pond": 2, "de": 2, "oliveira": 2, "pinto": 2, "jare": 2, "kaplan": 2, "harri": 2, "edward": 2, "yuri": 2, "burda": 2, "nichola": 2, "joseph": 2, "greg": 2, "brockman": 2, "rai": 2, "raul": 2, "puri": 2, "gretchen": 2, "krueger": 2, "michael": [2, 4], "petrov": 2, "heidi": 2, "khlaaf": 2, "girish": 2, "sastri": 2, "pamela": 2, "mishkin": 2, "brook": 2, "chan": 2, "scott": 2, "grai": 2, "nick": 2, "ryder": 2, "mikhail": 2, "pavlov": 2, "alethea": 2, "lukasz": 2, "kaiser": 2, "mohammad": 2, "bavarian": 2, "clemen": 2, "winter": 2, "philipp": 2, "tillet": 2, "felip": 2, "petroski": 2, "dave": 2, "cum": 2, "matthia": 2, "plappert": 2, "fotio": 2, "chantzi": 2, "elizabeth": 2, "barn": 2, "ariel": 2, "herbert": 2, "voss": 2, "hebgen": 2, "guss": 2, "nichol": 2, "paino": 2, "nikola": 2, "tezak": 2, "jie": 2, "tang": 2, "igor": 2, "babuschkin": 2, "suchir": 2, "balaji": 2, "shantanu": 2, "jain": 2, "saunder": 2, "christoph": 2, "hess": 2, "andrew": 2, "carr": 2, "jan": 2, "leik": 2, "josh": 2, "achiam": 2, "vedant": 2, "misra": 2, "evan": 2, "morikawa": 2, "alec": 2, "radford": 2, "matthew": 2, "knight": 2, "mile": 2, "brundag": 2, "mira": 2, "murati": 2, "kati": 2, "mayer": 2, "peter": 2, "welind": 2, "bob": [2, 4], "mcgrew": 2, "dario": 2, "amodei": 2, "sam": 2, "mccandlish": 2, "ilya": 2, "sutskev": 2, "wojciech": 2, "zaremba": 2, "arxiv": [2, 4], "ab": [2, 4], "2107": 2, "03374": 2, "cz": 2, "lianmin": 2, "ying": 2, "sheng": 2, "anastasio": 2, "angelopoulo": 2, "tianl": 2, "dacheng": 2, "hao": 2, "zhang": 2, "banghua": 2, "zhu": 2, "jordan": 2, "gonzalez": 2, "ion": 2, "stoica": 2, "2403": 2, "04132": 2, "cho24a": 2, "francoi": 2, "arcpriz": 2, "cho24b": 2, "dglh24": 2, "yann": 2, "bal\u00e1z": 2, "galambosi": 2, "perci": 2, "tatsunori": 2, "hashimoto": 2, "debia": 2, "2404": 2, "04475": 2, "fac24a": 2, "wiki": [2, 4], "fac24b": 2, "fac24c": 2, "doc": [2, 3, 4], "model_doc": 2, "gpt2": 2, "fac24d": 2, "cookbook": 2, "llm_judg": 2, "fac24": 2, "fac24f": 2, "blog": [2, 4], "fhwt23": 2, "cl\u00e9mentin": 2, "nathan": 2, "habib": 2, "hbb": 2, "dan": 2, "collin": 2, "burn": 2, "steven": 2, "basart": 2, "andi": 2, "zou": 2, "manta": 2, "mazeika": 2, "dawn": 2, "song": 2, "jacob": 2, "steinhardt": 2, "03300": 2, "hbd": 2, "ari": 2, "du": 2, "maxwel": 2, "forb": 2, "yejin": 2, "choi": 2, "curiou": 2, "neural": [2, 4], "degener": 2, "1904": 2, "09751": 2, "hyc": 2, "binyuan": 2, "jian": 2, "zeyu": 2, "cui": 2, "jiaxi": 2, "dayiheng": 2, "liu": [2, 4], "lei": 2, "tianyu": 2, "jiajun": 2, "bowen": 2, "yu": 2, "kai": 2, "dang": 2, "coder": 2, "preprint": [2, 4], "2409": 2, "12186": 2, "lx": 2, "zhen": 2, "xiaohan": 2, "xu": 2, "tao": 2, "shen": 2, "jia": 2, "gu": 2, "yuxuan": 2, "lai": 2, "chongyang": 2, "shuai": 2, "ma": 2, "nlg": 2, "2401": 2, "07103": 2, "lbl": 2, "rishi": 2, "bommasani": 2, "toni": 2, "lee": [2, 4], "dimitri": 2, "tsipra": 2, "dilara": 2, "soylu": 2, "michihiro": 2, "yasunaga": 2, "yian": 2, "deepak": 2, "narayanan": 2, "yuhuai": 2, "wu": [2, 4], "ananya": 2, "kumar": 2, "benjamin": 2, "newman": 2, "binhang": 2, "bobbi": 2, "yan": 2, "ce": 2, "christian": 2, "cosgrov": 2, "r\u00e9": 2, "diana": 2, "acosta": 2, "nava": 2, "drew": 2, "hudson": 2, "eric": 2, "zelikman": 2, "esin": 2, "durmu": 2, "faisal": 2, "ladhak": 2, "frieda": 2, "rong": 2, "hongyu": 2, "ren": 2, "huaxiu": 2, "yao": 2, "jue": 2, "keshav": 2, "santhanam": 2, "laurel": 2, "orr": 2, "lucia": 2, "mert": 2, "yuksekgonul": 2, "mirac": 2, "suzgun": 2, "kim": 2, "neel": 2, "guha": 2, "niladri": 2, "chatterji": 2, "omar": 2, "khattab": 2, "henderson": 2, "qian": 2, "huang": 2, "ryan": 2, "chi": [2, 4], "sang": 2, "xie": 2, "shibani": 2, "santurkar": 2, "surya": 2, "ganguli": 2, "icard": 2, "tianyi": 2, "vishrav": 2, "chaudhari": 2, "xuechen": 2, "yifan": 2, "yuhui": 2, "yuta": 2, "koreeda": 2, "2211": 2, "09110": 2, "lbc24": 2, "yuchen": 2, "ronan": 2, "le": 2, "bra": 2, "allenai": 2, "lhe22": 2, "stephani": 2, "hilton": 2, "owain": 2, "mimic": 2, "falsehood": 2, "2109": 2, "07958": 2, "ras24": 2, "sebastian": 2, "scratch": 2, "isbn": 2, "1633437166": 2, "srr": 2, "aarohi": 2, "abhinav": 2, "rastogi": 2, "abhishek": 2, "rao": 2, "abu": 2, "awal": 2, "md": [2, 4], "shoeb": 2, "abubakar": 2, "abid": 2, "adam": 2, "fisch": 2, "brown": 2, "santoro": 2, "aditya": 2, "gupta": 2, "adri\u00e0": 2, "garriga": 2, "alonso": 2, "agnieszka": 2, "kluska": 2, "aitor": 2, "lewkowycz": 2, "akshat": 2, "agarw": 2, "warstadt": 2, "alexand": [2, 4], "kocurek": 2, "ali": 2, "safaya": 2, "tazarv": 2, "alic": [2, 4], "xiang": 2, "alicia": 2, "parrish": 2, "allen": 2, "nie": 2, "aman": 2, "hussain": 2, "amanda": 2, "askel": 2, "dsouza": 2, "ambros": 2, "slone": 2, "ameet": 2, "rahan": 2, "anantharaman": 2, "iyer": 2, "ander": 2, "andreassen": 2, "madotto": 2, "santilli": 2, "stuhlm\u00fcller": 2, "la": 2, "lampinen": 2, "angela": 2, "jiang": 2, "angelica": 2, "anh": 2, "vuong": 2, "animesh": 2, "anna": 2, "gottardi": 2, "antonio": 2, "norelli": 2, "anu": 2, "venkatesh": 2, "arash": 2, "gholamidavoodi": 2, "arfa": 2, "tabassum": 2, "arul": 2, "menez": 2, "arun": 2, "kirubarajan": 2, "asher": 2, "mullokandov": 2, "ashish": 2, "sabharw": 2, "herrick": 2, "avia": 2, "efrat": 2, "aykut": 2, "erdem": 2, "ayla": 2, "karaka\u015f": 2, "robert": 2, "bao": 2, "loe": 2, "barret": 2, "zoph": 2, "bart\u0142omiej": 2, "bojanowski": 2, "batuhan": 2, "\u00f6zyurt": 2, "behnam": 2, "hedayatnia": 2, "neyshabur": 2, "inden": 2, "benno": 2, "stein": 2, "berk": 2, "ekmekci": 2, "blake": 2, "howald": 2, "bryan": 2, "orinion": 2, "cameron": [2, 4], "diao": 2, "dour": 2, "catherin": 2, "stinson": 2, "cedrick": 2, "argueta": 2, "c\u00e9sar": 2, "ferri": 2, "ram\u00edrez": 2, "chandan": 2, "singh": 2, "charl": 2, "rathkopf": 2, "chenlin": 2, "meng": 2, "chitta": 2, "baral": 2, "chiyu": 2, "callison": 2, "burch": 2, "wait": 2, "voigt": 2, "cindi": 2, "ramirez": 2, "clara": 2, "rivera": 2, "clemencia": 2, "siro": 2, "colin": 2, "raffel": 2, "courtnei": 2, "ashcraft": 2, "cristina": 2, "garbacea": 2, "damien": 2, "sileo": 2, "garrett": 2, "kilman": 2, "roth": 2, "daniel": 2, "freeman": 2, "khashabi": 2, "levi": 2, "mosegu\u00ed": 2, "gonz\u00e1lez": 2, "perszyk": 2, "danni": 2, "hernandez": 2, "danqi": 2, "daphn": 2, "ippolito": 2, "dar": 2, "gilboa": 2, "david": 2, "dohan": 2, "drakard": 2, "jurgen": 2, "debajyoti": 2, "datta": 2, "deni": 2, "emelin": 2, "kleyko": 2, "deniz": 2, "yuret": 2, "derek": 2, "tam": [2, 4], "dieuwk": 2, "hupk": 2, "diganta": 2, "dilyar": 2, "buzan": 2, "coelho": 2, "mollo": 2, "diyi": 2, "dong": 2, "ho": 2, "dylan": 2, "schrader": 2, "ekaterina": 2, "shutova": 2, "ekin": 2, "dogu": 2, "cubuk": 2, "elad": 2, "segal": 2, "eleanor": 2, "hagerman": 2, "donowai": 2, "elli": 2, "pavlick": 2, "emanuel": 2, "rodola": 2, "emma": 2, "lam": 2, "chu": 2, "erkut": 2, "erni": 2, "ethan": 2, "dyer": 2, "jerzak": 2, "eunic": 2, "engefu": 2, "manyasi": 2, "evgenii": 2, "zheltonozhskii": 2, "fanyu": 2, "xia": 2, "fatemeh": 2, "siar": 2, "fernando": 2, "mart\u00ednez": 2, "plume": 2, "francesca": 2, "happ\u00e9": 2, "gaurav": 2, "mishra": 2, "genta": 2, "indra": 2, "winata": 2, "gerard": 2, "melo": 2, "germ\u00e1n": 2, "kruszewski": 2, "giambattista": 2, "parascandolo": 2, "giorgio": 2, "mariani": 2, "gloria": 2, "gonzalo": 2, "jaimovitch": 2, "l\u00f3pez": 2, "gregor": 2, "betz": 2, "gui": 2, "gur": 2, "hana": 2, "galijasev": 2, "hannah": 2, "rashkin": 2, "hannaneh": 2, "hajishirzi": 2, "harsh": 2, "mehta": 2, "hayden": 2, "bogar": 2, "henri": 2, "shevlin": 2, "hinrich": 2, "sch\u00fctze": 2, "hiromu": 2, "yakura": 2, "hongm": 2, "hugh": 2, "mee": 2, "wong": 2, "ian": 2, "ng": 2, "isaac": 2, "nobl": 2, "jaap": 2, "jumelet": 2, "jack": 2, "geissing": 2, "jackson": 2, "kernion": 2, "jaehoon": 2, "jaim": 2, "fern\u00e1ndez": 2, "fisac": 2, "jame": 2, "simon": 2, "koppel": 2, "koco\u0144": 2, "jana": 2, "thompson": 2, "janel": 2, "wingfield": 2, "jarema": 2, "radom": 2, "jascha": 2, "sohl": 2, "dickstein": 2, "jason": 2, "phang": 2, "yosinski": 2, "jekaterina": 2, "novikova": 2, "jell": 2, "bosscher": 2, "jennif": 2, "marsh": 2, "jeremi": 2, "jeroen": 2, "taal": 2, "jess": 2, "engel": 2, "jesujoba": 2, "alabi": 2, "jiacheng": 2, "jiam": 2, "jillian": 2, "joan": 2, "waweru": 2, "john": 2, "burden": 2, "miller": 2, "bali": 2, "jonathan": 2, "batcheld": 2, "berant": 2, "j\u00f6rg": 2, "frohberg": 2, "jo": 2, "rozen": 2, "orallo": 2, "boudeman": 2, "guerr": 2, "joshua": 2, "tenenbaum": 2, "joyc": 2, "chua": 2, "kamil": 2, "kanclerz": 2, "karen": 2, "livescu": 2, "karl": 2, "krauth": 2, "karthik": 2, "gopalakrishnan": 2, "katerina": 2, "ignatyeva": 2, "katja": 2, "markert": 2, "kaustubh": 2, "dhole": 2, "kevin": 2, "gimpel": 2, "omondi": 2, "kori": 2, "mathewson": 2, "kristen": 2, "chiafullo": 2, "ksenia": 2, "shkaruta": 2, "shridhar": 2, "kyle": 2, "mcdonel": 2, "richardson": 2, "laria": 2, "reynold": 2, "leo": 2, "gao": 2, "liam": 2, "dugan": 2, "lianhui": 2, "qin": 2, "lidia": 2, "contrera": 2, "ochando": 2, "loui": 2, "morenc": 2, "moschella": 2, "luci": 2, "ludwig": 2, "schmidt": 2, "luheng": 2, "lui": 2, "olivero": 2, "col\u00f3n": 2, "luke": 2, "metz": 2, "l\u00fctfi": 2, "kerem": 2, "\u015fenel": 2, "maarten": 2, "bosma": 2, "sap": 2, "maartj": 2, "hoev": 2, "maheen": 2, "farooqi": 2, "manaal": 2, "faruqui": 2, "marco": 2, "baturan": 2, "marelli": 2, "maru": 2, "maria": 2, "quintana": 2, "mari": 2, "tolkiehn": 2, "mario": 2, "giulianelli": 2, "martha": 2, "martin": 2, "potthast": 2, "leavitt": 2, "hagen": 2, "m\u00e1ty\u00e1": 2, "schubert": 2, "medina": 2, "orduna": 2, "baitemirova": 2, "melodi": 2, "arnaud": 2, "melvin": 2, "mcelrath": 2, "yee": 2, "cohen": 2, "ivanitskii": 2, "starritt": 2, "strube": 2, "micha\u0142": 2, "sw\u0119drowski": 2, "michel": 2, "bevilacqua": 2, "mihir": 2, "kale": 2, "cain": 2, "mime": 2, "mitch": 2, "walker": 2, "mo": 2, "tiwari": 2, "mohit": 2, "bansal": 2, "moin": 2, "aminnaseri": 2, "mor": 2, "geva": 2, "mozhdeh": 2, "gheini": 2, "mukund": 2, "varma": 2, "nanyun": 2, "peng": 2, "nayeon": 2, "neta": 2, "krakov": 2, "doiron": 2, "nicol": 2, "martinez": 2, "nikita": 2, "nangia": 2, "nikla": 2, "decker": 2, "muennighoff": 2, "nitish": 2, "shirish": 2, "keskar": 2, "niveditha": 2, "noah": 2, "constant": 2, "fiedel": 2, "nuan": 2, "wen": 2, "oliv": 2, "agha": 2, "elbaghdadi": 2, "omer": 2, "moreno": 2, "casar": 2, "parth": 2, "doshi": 2, "pascal": 2, "fung": 2, "paul": 2, "pu": 2, "vicol": 2, "pegah": 2, "alipoormolabashi": 2, "peiyuan": 2, "liao": 2, "eckerslei": 2, "phu": 2, "mon": 2, "htut": 2, "pinyu": 2, "hwang": 2, "piotr": 2, "mi\u0142kowski": 2, "piyush": 2, "patil": 2, "pouya": 2, "pezeshkpour": 2, "priti": 2, "oli": 2, "qiaozhu": 2, "mei": 2, "qing": 2, "lyu": 2, "qinlang": 2, "rabin": 2, "banjad": 2, "rachel": 2, "etta": 2, "rudolph": 2, "raefer": 2, "rahel": 2, "haback": 2, "ramon": 2, "risco": 2, "rapha\u00ebl": 2, "milli\u00e8r": 2, "rhythm": 2, "garg": 2, "rif": 2, "saurou": 2, "riku": 2, "arakawa": 2, "robb": 2, "raymaek": 2, "frank": 2, "rohan": 2, "sikand": 2, "roman": 2, "novak": 2, "sitelew": 2, "lebra": 2, "rosann": 2, "rowan": 2, "rui": [2, 4], "ruslan": 2, "salakhutdinov": 2, "stoval": 2, "teehan": 2, "rylan": 2, "sahib": 2, "saif": 2, "sajant": 2, "anand": 2, "dillav": 2, "shleifer": 2, "wiseman": 2, "samuel": 2, "gruetter": 2, "bowman": 2, "schoenholz": 2, "sanghyun": 2, "han": 2, "sanjeev": 2, "kwatra": 2, "sarah": 2, "sarik": 2, "ghazarian": 2, "sayan": 2, "ghosh": 2, "sean": 2, "casei": 2, "bischoff": 2, "gehrmann": 2, "schuster": 2, "sepideh": 2, "sadeghi": 2, "shadi": 2, "hamdan": 2, "sharon": 2, "zhou": 2, "shashank": 2, "sherri": 2, "shi": 2, "shikhar": 2, "shima": 2, "asaadi": 2, "shixiang": 2, "shane": 2, "shubh": 2, "pachchigar": 2, "shubham": 2, "toshniw": 2, "shyam": 2, "upadhyai": 2, "shyamolima": 2, "debnath": 2, "siamak": 2, "shakeri": 2, "thormey": 2, "melzi": 2, "siva": 2, "reddi": 2, "sneha": 2, "priscilla": 2, "makini": 2, "soo": 2, "hwan": 2, "spencer": 2, "toren": 2, "sriharsha": 2, "hatwar": 2, "stanisla": 2, "dehaen": 2, "stefan": 2, "divic": 2, "stefano": 2, "ermon": 2, "stella": 2, "biderman": 2, "stephen": 2, "prasad": 2, "piantadosi": 2, "stuart": 2, "shieber": 2, "summer": 2, "misherghi": 2, "svetlana": 2, "kiritchenko": 2, "swaroop": 2, "tal": 2, "linzen": 2, "tariq": 2, "tatsu": 2, "te": 2, "th\u00e9o": 2, "desbord": 2, "theodor": 2, "rothschild": 2, "phan": 2, "tiberiu": 2, "nkinyili": 2, "timo": 2, "schick": 2, "timofei": 2, "kornev": 2, "titu": 2, "tunduni": 2, "gerstenberg": 2, "trenton": 2, "trishala": 2, "neeraj": 2, "tushar": 2, "khot": 2, "tyler": 2, "shultz": 2, "uri": 2, "shaham": 2, "vera": 2, "demberg": 2, "victoria": 2, "nyamai": 2, "vika": 2, "raunak": 2, "vinai": 2, "ramasesh": 2, "udai": 2, "prabhu": 2, "vishakh": 2, "padmakumar": 2, "vivek": 2, "srikumar": 2, "fedu": 2, "wout": 2, "vossen": 2, "xiaoyu": 2, "tong": 2, "xinran": 2, "zhao": 2, "xinyi": 2, "xudong": 2, "yadollah": 2, "yaghoobzadeh": 2, "yair": 2, "lakretz": 2, "yangqiu": 2, "yasaman": 2, "bahri": 2, "yichi": 2, "yide": 2, "yifu": 2, "yonatan": 2, "belinkov": 2, "hou": 2, "yufang": 2, "yuntao": 2, "bai": 2, "zachari": 2, "seid": 2, "zhuoy": 2, "zijian": 2, "ziji": 2, "j": [2, 4], "zirui": 2, "ziyi": 2, "extrapol": 2, "2206": 2, "04615": 2, "wpn": 2, "yada": 2, "pruksachatkun": 2, "amanpreet": 2, "julian": 2, "felix": 2, "hill": 2, "stickier": 2, "wsm": 2, "1804": 2, "07461": 2, "wtb": 2, "yi": [2, 4], "tai": 2, "borgeaud": 2, "dani": 2, "yogatama": 2, "denni": 2, "donald": 2, "metzler": 2, "ed": 2, "h": 2, "oriol": 2, "vinyal": 2, "dean": 2, "07682": 2, "wdr": 2, "doolei": 2, "manlei": 2, "arka": 2, "pal": 2, "feuer": 2, "siddhartha": 2, "ravid": 2, "shwartz": 2, "ziv": 2, "khalid": 2, "saifullah": 2, "siddartha": 2, "naidu": 2, "chinmai": 2, "hegd": 2, "lecun": 2, "tom": 2, "goldstein": 2, "willi": 2, "neiswang": 2, "micah": 2, "goldblum": 2, "2406": 2, "19314": 2, "yyh": 2, "baosong": 2, "bo": 2, "chengpeng": 2, "chengyuan": 2, "fei": 2, "guant": 2, "haoran": 2, "huan": 2, "jialong": 2, "jialin": 2, "jianhong": 2, "tu": 2, "jianwei": 2, "jianxin": 2, "jin": 2, "jingren": 2, "jinz": 2, "jinzheng": 2, "junyang": 2, "keme": 2, "lu": 2, "keqin": 2, "kexin": 2, "mingfeng": 2, "xue": 2, "ni": 2, "pei": 2, "ru": 2, "men": 2, "ruiz": 2, "runji": 2, "shiji": 2, "sinan": 2, "tan": 2, "tianhang": 2, "tianhao": 2, "wenbin": 2, "ge": 2, "xiaodong": 2, "deng": 2, "xiaohuan": 2, "xingzhang": 2, "xinyu": 2, "xipin": 2, "xuancheng": 2, "fan": 2, "yichang": 2, "wan": 2, "yunfei": 2, "yuqiong": 2, "zhenru": 2, "zhihao": 2, "2407": 2, "10671": 2, "zc": 2, "siyuan": 2, "zhuang": 2, "zhanghao": 2, "yonghao": 2, "zi": 2, "zhuohan": 2, "xing": 2, "2306": 2, "05685": 2, "huggingface24": 2, "06": [2, 4], "metaai24": 2, "promptfoo24": 2, "toolkit": 2, "dev": 2, "far": 3, "possibli": 3, "eliot": 3, "english": 3, "thumb": 3, "\u00be": 3, "max_output_token": 3, "4096": 3, "16384": 3, "contrari": 3, "surpass": 3, "truncat": 3, "max_input_token": 3, "input_cost_per_token": 3, "output_cost_per_token": 3, "11b": 3, "v1": 3, "128000": 3, "5e": 3, "20241022": 3, "8192": 3, "200000": 3, "3e": 3, "0613": 3, "6e": 3, "1e": 3, "gemini": 3, "flash": 3, "1048576": 3, "2097152": 3, "05e": 3, "incomplet": 3, "abruptli": 3, "shallow": 3, "thorough": 3, "dissatisfact": 3, "frustrat": 3, "creation": 3, "feasibl": 3, "split": 3, "10k": 3, "diagram": 3, "charactertextsplitt": 3, "tiktoken": 3, "sequenti": 3, "newlin": 3, "broadli": [3, 4], "want": [3, 4], "sure": [3, 4], "cheap": 3, "speciali": 3, "naiv": 3, "nltk": 3, "spaci": 3, "recurs": 3, "divid": 3, "hierarch": 3, "talk": 3, "theme": 3, "splitter": 3, "markdown": 3, "get_chunk": 3, "chunk_siz": 3, "chunk_overlap": 3, "langchain_text_splitt": 3, "text_splitt": 3, "from_tiktoken_encod": 3, "split_text": 3, "persona": 3, "task": [3, 4], "langchain_cor": [3, 4], "prompttempl": 3, "get_base_prompt_templ": 3, "base_prompt": [3, 4], "from_templ": 3, "llmchain": 3, "togeth": 3, "parser": [3, 4], "output_pars": 3, "stroutputpars": 3, "langchain_commun": 3, "chat_model": 3, "chatlitellm": 3, "get_llm_chain": 3, "prompt_templ": [3, 4], "llm_chain": [3, 4], "api_key_label": 3, "upper": 3, "_api_kei": 3, "get_dynamic_prompt_templ": 3, "get_dynamic_prompt_param": 3, "prompt_param": 3, "part_idx": 3, "total_part": 3, "chat_context": 3, "param": 3, "dynamic_prompt_param": 3, "elif": 3, "merg": 3, "concaten": 3, "generate_report": 3, "input_cont": 3, "llm_model_nam": 3, "report_part": 3, "num_part": 3, "dinam": 3, "priovid": 3, "invok": [3, 4], "cummul": 3, "join": 3, "max_chunk_s": 3, "max_chunk_overlap": 3, "readabl": 3, "apple_report": 3, "luation": 3, "disciplin": 3, "smooth": 3, "subhead": 3, "despit": [3, 4], "depth": 3, "overlook": 3, "preserv": 3, "easier": [3, 4], "preprocess": [3, 4], "necessit": 3, "meticul": 3, "bottleneck": 3, "friendli": 3, "mustafa": 3, "suleyman": 3, "infinit": 3, "fewer": 3, "progress": 3, "condens": 3, "versatil": 3, "drive": [3, 4], "grace": 3, "fallback": 3, "empow": 3, "crucial": [3, 4], "langchain24": 3, "how_to": 3, "freedom": 4, "julia": 4, "easili": 4, "notebook": 4, "overrid": 4, "response_cont": 4, "wow": 4, "lot": 4, "breakdown": 4, "impress": 4, "huge": 4, "ye": 4, "serious": 4, "is_json": 4, "myjson": 4, "valueerror": 4, "trial": 4, "elicit": 4, "wrangl": 4, "hoc": 4, "streamlin": 4, "subsequ": 4, "dataset": 4, "unwant": 4, "ui": 4, "overflow": 4, "overwhelm": 4, "twitter": 4, "youtub": 4, "publish": 4, "schema": 4, "blueprint": 4, "nativ": 4, "json_format": 4, "person1": 4, "q1": 4, "person2": 4, "nest": 4, "todai": 4, "thellm": 4, "unend": 4, "whitespac": 4, "forget": 4, "throw": 4, "somewher": 4, "json_object": 4, "sheer": 4, "circul": 4, "vertex": 4, "worri": 4, "enum": 4, "refus": 4, "simpler": 4, "strongli": 4, "secextract": 4, "mentioned_ent": 4, "mentioned_plac": 4, "extract_from_sec_fil": 4, "sec_filing_text": 4, "hint": 4, "prompt_extract": 4, "sec_extract": 4, "washington": 4, "usabl": 4, "beg": 4, "with_structured_output": 4, "runnabl": 4, "typeddict": 4, "qu": 4, "langchain_openai": 4, "chatopenai": 4, "chatprompttempl": 4, "extract_from_sec_filing_langchain": 4, "structured_llm": 4, "from_messag": 4, "sec_extraction_langchain": 4, "hood": 4, "logit": 4, "willard": 4, "louf": 4, "reformul": 4, "finit": 4, "fsm": 4, "s_": 4, "sim": 4, "s_t": 4, "theta": 4, "s_1": 4, "v": 4, "mathbb": 4, "mask": 4, "tild": 4, "odot": 4, "rightarrow": 4, "boolean": 4, "wise": 4, "formul": 4, "regex": 4, "tran": 4, "thien": 4, "automaton": 4, "dfa": 4, "decod": 4, "outgo": 4, "renorm": 4, "yy": 4, "nn": 4, "ever": 4, "aa": 4, "lwai": 4, "prop": 4, "yynnaa": 4, "qwen": 4, "malform": 4, "sec_extraction_outlin": 4, "zsp": 4, "zicorp": 4, "phenomenon": 4, "popular": 4, "cpp": 4, "gbnf": 4, "ggml": 4, "bnf": 4, "ggerganov": 4, "accomplish": 4, "backu": 4, "naur": 4, "wikipedia": 4, "contributor": 4, "strictli": 4, "soon": 4, "curl": 4, "fssl": 4, "sh": 4, "extract_entities_from_sec_fil": 4, "suffix": 4, "ollama_structured_output_prompt_suffix": 4, "ollama_structured_output_temperatur": 4, "mistral": 4, "llama2": 4, "uncensor": 4, "model_json_schema": 4, "response_json": 4, "wrapper": 4, "exllama2": 4, "mlx": 4, "lm": 4, "medium": 4, "know": 4, "chanc": 4, "correctli": 4, "famili": 4, "furthermor": 4, "nonetheless": 4, "studi": 4, "wrap": 4, "gemma": 4, "uncov": 4, "wors": 4, "extran": 4, "dispar": 4, "preval": 4, "outdat": 4, "rapidli": 4, "fashion": 4, "remark": 4, "me": 4, "speak": 4, "freeli": 4, "aider": 4, "outweigh": 4, "rebutt": 4, "argu": 4, "reproduct": 4, "paint": 4, "pictur": 4, "verif": 4, "dottxt": 4, "flaw": 4, "uneven": 4, "didn": 4, "conflat": 4, "argument": 4, "drawback": 4, "unlock": 4, "wider": 4, "thank": 4, "pfiffer": 4, "aid24": 4, "dot24": 4, "sai": 4, "demo": 4, "tree": 4, "gge24": 4, "blob": 4, "readm": 4, "llf": 4, "xieyang": 4, "frederick": 4, "fiannaca": 4, "terri": 4, "koo": 4, "dixon": 4, "cai": 4, "ea": 4, "ny": 4, "usa": 4, "machineri": 4, "1145": 4, "3613905": 4, "3650756": 4, "ln": 4, "xuan": 4, "hai": 4, "nguyen": 4, "ngoc": 4, "tiviati": 4, "hieu": 4, "dao": 4, "shafiq": 4, "joti": 4, "kenji": 4, "kawaguchi": 4, "nanci": 4, "min": 4, "kan": 4, "2408": 4, "08656": 4, "out24": 4, "twt": 4, "zhi": 4, "cheng": 4, "kuang": 4, "tsai": 4, "chieh": 4, "hung": 4, "yun": 4, "nung": 4, "02442": 4, "tt24": 4, "vivien": 4, "vivien000": 4, "wl23": 4, "brandon": 4, "r\u00e9mi": 4, "2307": 4, "09702": 4, "wikipediacontributors24": 4, "wiktionari": 4, "naur_form": 4}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"introduct": [0, 1, 4], "content": [0, 2, 3, 4], "core": 0, "challeng": 0, "we": 0, "ll": 0, "address": 0, "A": [0, 1], "practic": [0, 1, 4], "approach": 0, "note": 0, "perspect": 0, "who": 0, "thi": 0, "book": 0, "i": 0, "For": 0, "outcom": 0, "prerequisit": 0, "set": 0, "up": 0, "your": 0, "environ": 0, "python": 0, "setup": 0, "api": [0, 4], "kei": [0, 2, 3], "configur": 0, "code": 0, "repositori": 0, "troubleshoot": 0, "common": 0, "issu": 0, "about": 0, "author": 0, "": 0, "tame": 1, "llm": [1, 2], "guid": 1, "pitfal": 1, "open": 1, "sourc": 1, "softwar": [1, 2], "chapter": 1, "1": [1, 3], "2": [1, 3], "wrestl": [1, 4], "structur": [1, 4], "output": [1, 3, 4], "3": [1, 3], "input": 1, "size": [1, 3], "length": [1, 3], "limit": [1, 3], "4": [1, 3], "5": 1, "The": [1, 2], "eval": [1, 2], "gap": [1, 2], "6": 1, "hallucin": 1, "realiti": 1, "7": 1, "safeti": 1, "concern": 1, "8": 1, "cost": [1, 3], "factor": 1, "9": 1, "break": 1, "free": 1, "from": 1, "cloud": 1, "provid": [1, 4], "appendix": 1, "tool": [1, 2, 4], "resourc": 1, "non": 2, "determinist": 2, "gener": [2, 3], "machin": 2, "temperatur": 2, "sampl": 2, "spectrum": 2, "emerg": 2, "properti": 2, "problem": [2, 3, 4], "statement": [2, 3, 4], "tradit": 2, "v": 2, "design": 2, "applic": 2, "test": 2, "requir": 2, "matrix": 2, "conceptu": 2, "overview": 2, "consider": [2, 3], "metric": 2, "evalu": 2, "task": 2, "model": [2, 3], "base": [2, 3], "human": 2, "benchmark": 2, "leaderboard": 2, "lightev": 2, "mmlu": 2, "econometr": 2, "dataset": 2, "famili": 2, "us": 2, "langsmith": 2, "promptfoo": 2, "refer": [2, 3, 4], "what": 3, "ar": 3, "token": 3, "comparison": [3, 4], "across": 3, "chunk": 3, "contextu": 3, "link": 3, "long": 3, "form": 3, "step": 3, "write": 3, "prompt": [3, 4], "templat": 3, "construct": 3, "dynam": 3, "paramet": 3, "report": 3, "exampl": 3, "usag": 3, "discuss": [3, 4], "implic": 3, "futur": 3, "conclus": [3, 4], "user": 4, "need": 4, "solut": 4, "strategi": 4, "techniqu": 4, "One": 4, "shot": 4, "specif": 4, "json": 4, "mode": 4, "langchain": 4, "outlin": 4, "ollama": 4, "compar": 4, "framework": 4, "best": 4, "research": 4, "ongo": 4, "debat": 4, "acknowledg": 4}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinxcontrib.bibtex": 9, "sphinx": 57}, "alltitles": {"Introduction": [[0, "introduction"], [4, "introduction"]], "Contents": [[0, "contents"], [2, "contents"], [3, "contents"], [4, "contents"]], "Core Challenges We\u2019ll Address": [[0, "core-challenges-we-ll-address"]], "A Practical Approach": [[0, "a-practical-approach"]], "A Note on Perspective": [[0, "a-note-on-perspective"]], "Who This Book Is For": [[0, "who-this-book-is-for"]], "Outcomes": [[0, "outcomes"]], "Prerequisites": [[0, "prerequisites"]], "Setting Up Your Environment": [[0, "setting-up-your-environment"]], "Python Environment Setup": [[0, "python-environment-setup"]], "API Keys Configuration": [[0, "api-keys-configuration"]], "Code Repository": [[0, "code-repository"]], "Troubleshooting Common Issues": [[0, "troubleshooting-common-issues"]], "About the Author(s)": [[0, "about-the-author-s"]], "Taming LLMs": [[1, "taming-llms"]], "A Practical Guide to LLM Pitfalls with Open Source Software": [[1, "a-practical-guide-to-llm-pitfalls-with-open-source-software"]], "Chapter 1: Introduction": [[1, "chapter-1-introduction"]], "Chapter 2: Wrestling with Structured Output": [[1, "chapter-2-wrestling-with-structured-output"]], "Chapter 3: Input Size and Length Limitations": [[1, "chapter-3-input-size-and-length-limitations"]], "Chapter 4: Output Size and Length Limitations": [[1, "chapter-4-output-size-and-length-limitations"]], "Chapter 5: The Evals Gap": [[1, "chapter-5-the-evals-gap"]], "Chapter 6: Hallucination: The Reality Gap": [[1, "chapter-6-hallucination-the-reality-gap"]], "Chapter 7: Safety Concerns": [[1, "chapter-7-safety-concerns"]], "Chapter 8: The Cost Factor": [[1, "chapter-8-the-cost-factor"]], "Chapter 9: Breaking Free from Cloud Providers": [[1, "chapter-9-breaking-free-from-cloud-providers"]], "Appendix A: Tools and Resources": [[1, "appendix-a-tools-and-resources"]], "The Evals Gap": [[2, "the-evals-gap"]], "Non-Deterministic Generative Machines": [[2, "non-deterministic-generative-machines"]], "Temperature and Sampling": [[2, "temperature-and-sampling"]], "The Temperature Spectrum": [[2, "the-temperature-spectrum"]], "Emerging Properties": [[2, "emerging-properties"]], "Problem Statement": [[2, "problem-statement"], [3, "problem-statement"], [4, "problem-statement"]], "Evals of Traditional Software vs LLMs": [[2, "evals-table"]], "Evals Design": [[2, "evals-design"]], "LLM Application Testing Requirements Matrix": [[2, "validation-requirements"]], "Conceptual Overview": [[2, "conceptual-overview"]], "Design Considerations": [[2, "design-considerations"]], "Metrics": [[2, "metrics"]], "Key Metrics for Evaluating Generative Tasks": [[2, "key-metrics"]], "Evaluators": [[2, "evaluators"]], "Model-Based Evaluation": [[2, "model-based-evaluation"]], "Human-Based Evaluation": [[2, "human-based-evaluation"]], "Evaluating Evaluators": [[2, "evaluating-evaluators"]], "Benchmarks and Leaderboards": [[2, "benchmarks-and-leaderboards"]], "Tools": [[2, "tools"]], "LightEval": [[2, "lighteval"]], "MMLU Econometrics Task Dataset sample": [[2, "mmlu-econometrics"]], "Model Families Evaluated Using LightEval": [[2, "model-families"]], "LangSmith": [[2, "langsmith"]], "PromptFoo": [[2, "promptfoo"]], "References": [[2, "references"], [3, "references"], [4, "references"]], "Output Size Limitations": [[3, "output-size-limitations"]], "What are Token Limits?": [[3, "what-are-token-limits"]], "Token Cost and Length Limitation Comparison Across Key Models": [[3, "token-cost-table"]], "Content Chunking with Contextual Linking": [[3, "content-chunking-with-contextual-linking"]], "Generating long-form content": [[3, "generating-long-form-content"]], "Step 1: Chunking the Content": [[3, "step-1-chunking-the-content"]], "Step 2: Writing the Base Prompt Template": [[3, "step-2-writing-the-base-prompt-template"]], "Step 3: Constructing Dynamic Prompt Parameters": [[3, "step-3-constructing-dynamic-prompt-parameters"]], "Step 4: Generating the Report": [[3, "step-4-generating-the-report"]], "Example Usage": [[3, "example-usage"]], "Discussion": [[3, "discussion"], [4, "discussion"]], "Implications": [[3, "implications"]], "Future Considerations": [[3, "future-considerations"]], "Conclusion": [[3, "conclusion"], [4, "conclusion"]], "Wrestling with Structured Output": [[4, "wrestling-with-structured-output"]], "User Needs": [[4, "user-needs"]], "Solutions": [[4, "solutions"]], "Strategies": [[4, "strategies"]], "Techniques and Tools": [[4, "techniques-and-tools"]], "One-Shot Prompts": [[4, "one-shot-prompts"]], "Structured Output with Provider-Specific APIs": [[4, "structured-output-with-provider-specific-apis"]], "JSON Mode": [[4, "json-mode"]], "LangChain": [[4, "langchain"]], "Outlines": [[4, "outlines"]], "Ollama": [[4, "ollama"]], "Comparing Solutions": [[4, "comparing-solutions"]], "Structured Output Frameworks Comparison": [[4, "structured-output-frameworks"]], "Best Practices": [[4, "best-practices"]], "Research and Ongoing Debate": [[4, "research-and-ongoing-debate"]], "Acknowledgements": [[4, "acknowledgements"]]}, "indexentries": {}}) \ No newline at end of file diff --git a/tamingllms/_build/jupyter_execute/markdown/intro.ipynb b/tamingllms/_build/jupyter_execute/markdown/intro.ipynb index d351568..c759f80 100644 --- a/tamingllms/_build/jupyter_execute/markdown/intro.ipynb +++ b/tamingllms/_build/jupyter_execute/markdown/intro.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "markdown", - "id": "dd8c65f3", + "id": "5a67fb7d", "metadata": {}, "source": [ "(intro)=\n", diff --git a/tamingllms/_build/jupyter_execute/notebooks/evals.ipynb b/tamingllms/_build/jupyter_execute/notebooks/evals.ipynb index be604ec..390ab38 100644 --- a/tamingllms/_build/jupyter_execute/notebooks/evals.ipynb +++ b/tamingllms/_build/jupyter_execute/notebooks/evals.ipynb @@ -1244,6 +1244,8 @@ "\n", "A major challenge with these leaderboards and benchmarks is test set contamination - when test data ends up in newer models' training sets, rendering the benchmarks ineffective. While some benchmarks try to address this through crowdsourced prompts and evaluations from humans or LLMs, these approaches introduce their own biases and struggle with difficult questions. **LiveBench** {cite}`white2024livebenchchallengingcontaminationfreellm` represents a novel solution, designed specifically to be resilient to both contamination and evaluation biases. As the first benchmark with continuously updated questions from recent sources, automated objective scoring, and diverse challenging tasks across multiple domains, LiveBench maintains its effectiveness even as models improve. Drawing from recent math competitions, research papers, news, and datasets, it creates contamination-free versions of established benchmark tasks. Current results show even top models achieving below 70% accuracy, demonstrating LiveBench's ability to meaningfully differentiate model capabilities. With monthly updates and an open collaborative approach, LiveBench aims to provide sustained value for model evaluation as the field advances.\n", "\n", + "Another notable benchmark is ZebraLogic {cite}`zebralogic2024`, which evaluates logical reasoning capabilities of LLMs through Logic Grid Puzzles - a type of Constraint Satisfaction Problem {cite}`brailsford1999constraint` commonly found in tests like the LSAT. These puzzles require assigning unique values to N houses across M different features based on given clues, demanding strategic reasoning and deduction to arrive at a unique correct solution. The benchmark's programmatically generated puzzles range from 2x2 to 6x6 in size and test LLMs using one-shot examples with reasoning steps. While humans can solve these puzzles through strategic methods like reductio ad absurdum and elimination, LLMs demonstrate significant limitations in this type of logical reasoning. Even the best-performing model, Claude 3.5 Sonnet, only achieves 33.4% accuracy across all puzzles and 12.4% on hard puzzles, with smaller models (7-10B parameters) solving less than 1% of hard puzzles as of December 2024. These results reveal critical gaps in LLMs' capabilities around counterfactual thinking, reflective reasoning, structured memorization, and compositional generalization.\n", + "\n", "A significant shift in AI evaluation came with the launch of the **The Alignment Research Center (ARC) Prize** {cite}`arcprize2024` by ARC Prize Inc., a non-profit for the public advancement of open artificial general intelligence. Hosted by Mike Knoop (Co-founder, Zapier) and François Chollet (Creator of ARC-AGI, Keras), this prize represents a paradigm shift in how we evaluate language models. Rather than focusing on narrow performance metrics, the ARC Prize assesses what it calls \"cognitive sufficiency\" - a model's ability to generate meaningful insights and tackle open-ended challenges. This new way to think about LLM evaluation emphasizes creative thinking, sophisticated reasoning, and the capacity to make genuinely useful contributions to human knowledge as we seek to define and measure what it means to achieve AGI (Artificial General Intelligence).\n", "\n", "\n", diff --git a/tamingllms/_build/jupyter_execute/notebooks/structured_output.ipynb b/tamingllms/_build/jupyter_execute/notebooks/structured_output.ipynb index 4370dc4..4845bac 100644 --- a/tamingllms/_build/jupyter_execute/notebooks/structured_output.ipynb +++ b/tamingllms/_build/jupyter_execute/notebooks/structured_output.ipynb @@ -637,18 +637,103 @@ "source": [ "### Outlines\n", "\n", - "Outlines {cite}`outlines2024` is a library specifically focused on structured text generation from LLMs. Under the hood, Outlines works by adjusting the probability distribution of the model's output logits - the raw scores from the final layer of the neural network that are normally converted into text tokens. By introducing carefully crafted logit biases, Outlines can guide the model to prefer certain tokens over others, effectively constraining its outputs to a predefined set of valid options. This provides fine-grained control over the model's generation process. In that way, Outlines provides several powerful features:\n", + "Outlines {cite}`outlines2024` is a library specifically focused on structured text generation from LLMs. Under the hood, Outlines works by adjusting the probability distribution of the model's output logits - the raw scores from the final layer of the neural network that are normally converted into text tokens. By introducing carefully crafted logit biases, Outlines can guide the model to prefer certain tokens over others, effectively constraining its outputs to a predefined set of valid options. \n", "\n", - "* **Multiple Choice Generation**: Restrict the LLM output to a predefined set of options.\n", - "* **Regex-based structured generation**: Guide the generation process using regular expressions.\n", - "* **Pydantic model**: Ensure the LLM output follows a Pydantic model.\n", - "* **JSON Schema**: Ensure the LLM output follows a JSON Schema." + "The authors solve the general guided generation problem {cite}`willard2023efficientguidedgenerationlarge`, which as a consequence solves the problem of structured output generation, in LLMs by introducing an efficient indexing approach that reformulates neural text generation using finite-state machines (FSMs).\n", + "\n", + "They define the next token generation as a random variable:\n", + "\n", + "$$s_{t+1} \\sim \\text{Categorical}(\\alpha) \\text{ where } \\alpha = \\text{LLM}(S_t, \\theta)$$\n", + "\n", + "Where:\n", + "\n", + "- $s_{t+1}$ is the next token to be generated\n", + "- $S_t = (s_1...s_t)$ represents a sequence of t tokens with $s_t \\in V$\n", + "- $V$ is the vocabulary with size $|V| = N$ (typically around $10^4$ or larger)\n", + "- $\\alpha \\in \\mathbb{R}^N$ is the output logits/probabilities over the vocabulary\n", + "- $\\theta$ is the set of trained parameters of the LLM\n", + "- $\\text{LLM}$ refers to a deep neural network trained on next-token-completion tasks\n", + "- $\\text{Categorical}(\\alpha)$ represents sampling from a categorical distribution with probabilities $\\alpha$\n", + "\n", + "When applying masking for guided generation, this becomes:\n", + "\n", + "$$\n", + "\\tilde{\\alpha} = m(S_t) \\odot \\alpha\n", + "$$\n", + "\n", + "$$\n", + "\\tilde{s}_{t+1} \\sim \\text{Categorical}(\\tilde{\\alpha})\n", + "$$\n", + "\n", + "Where:\n", + "\n", + "- $m: P(V) \\rightarrow {0,1}^N$ is a boolean mask function\n", + "- $\\odot$ represents element-wise multiplication\n", + "- $\\tilde{\\alpha}$ is the masked (constrained) probability distribution\n", + "- $\\tilde{s}_{t+1}$ is the next token sampled under constraints\n", + "\n", + "This formulation allows the masking operation to guide the generation process by zeroing out probabilities of invalid tokens according to the finite state machine states. But instead of checking the entire vocabulary (size N) at each generation step (O(N) complexity) to enforce output constraints, they convert constraints (regex/grammar) into FSM states and build an index mapping FSM states to valid vocabulary tokens. This achieves O(1) average complexity for token generation.\n", + "\n", + "In summary, there are two stages in the Outlines framework {cite}`vivien2024regex`:\n", + "\n", + "1. **Preprocessing Step**: Outlines converts a character-level deterministic finite automaton (DFA) testing whether a string matches a regex into a token-level DFA testing whether a token sequence is decoded in a string matching the regex.\n", + "\n", + "2. **Decoding Step**: At decoding time, the DFA is used to determine, for each new token, which potential tokens are allowed. Starting from the initial state of the DFA, the allowed tokens are determined by the outgoing transitions from the current state. The corresponding mask is applied to the next token probabilities and these probabilities are renormalized. A new token can then be sampled and the state of the DFA updated.\n", + "\n", + "At each step, the model's probability distribution is masked and renormalized according to the current state and valid transitions." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As an example, let's suppose we want to constrain the output of an LLM to the following set of options: \n", + "- Y/yes\n", + "- N/no\n", + "- N/never\n", + "- A/always\n", + "\n", + "\n", + "This can be done by creating a state machine that has a start state, an end state and a set of valid transitions between states with possible states represented as the following regex string: `r\"\\s*([Yy]es|[Nn]o|[Nn]ever|[Aa]lways)\"`.\n", + "\n", + "The state machine below illustrates how Outlines works under the hood {numref}`outlines_state_machine`, where:\n", + "- Prop: Represents the logit token probability given by the LLM\n", + "- Mask: Mask value of the transition as defined by the state machine\n", + "- Final: The renormalized token probability post-masking\n", + "\n", + "```{figure} ../_static/structured_output/outlines_state_machine.png\n", + "---\n", + "name: outlines_state_machine\n", + "alt: Outlines State Machine\n", + "scale: 50%\n", + "align: center\n", + "---\n", + "Outlines State Machine.\n", + "```\n", + "\n", + "The initial \"Start\" state contains a masking table that controls which tokens can begin the sequence. In this example, only characters from the set `[YyNnAa]` are allowed as valid first characters, with each having an assigned probability and mask value. The masking mechanism effectively filters out invalid tokens by setting their mask values to 0, ensuring only permitted transitions to the \"First\" state.\n", + "\n", + "After transitioning to the \"First\" state, the system continues to use probability masking to guide the sequence. For example, when receiving 'Y' as input, the masking table adjusts token probabilities to ensure valid continuations.\n", + "\n", + "This finite state machine architecture serves multiple purposes in controlling text generation:\n", + "\n", + "1. Managing token probabilities through strategic masking\n", + "2. Preventing invalid token sequences \n", + "3. Enforcing specific token patterns\n", + "4. Providing fine-grained control over token generation and validation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ + "This provides fine-grained control over the model's generation process. In that way, Outlines, the Python package, provides several powerful controlled generation features:\n", + "\n", + "* **Regex-based structured generation**: Guide the generation process using regular expressions.\n", + "* **Multiple Choice Generation**: Restrict the LLM output to a predefined set of options.\n", + "* **Pydantic model**: Ensure the LLM output follows a Pydantic model.\n", + "* **JSON Schema**: Ensure the LLM output follows a JSON Schema.\n", + "\n", "Outlines can support major proprietary LLM APIs (e.g. OpenAI's via vLLM). However, one of its key advantages is the ability to ensure structured output for Open Source models, which often lack such guarantees by default." ] }, @@ -666,7 +751,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In this example, we will use a Qwen2.5-0.5B model, a lightweight open source model from Alibaba Cloud known for its strong performance despite its small size. The model excels at instruction following and structured generation tasks while being efficient enough to run locally via Hugging Face's `transformers` library." + "In this example, we will use a `Qwen2.5-0.5B` model, a lightweight open source model from Alibaba Cloud known for its strong performance despite its small size." ] }, { @@ -772,7 +857,9 @@ "source": [ "### Ollama\n", "\n", - "Ollama is a popular tool that allows you to run large language models (LLMs) locally. It has recently added support for structured output generation. The current `ollama` implementation leverages llama.cpp GBNF (GGML BNF) grammars {cite}`llama_cpp_grammars` to enable structured output generation. llama.cpp GBNF forces language models to generate output in specific, predefined formats by constraining their outputs to follow precise rules and patterns. The system accomplishes this through a formal grammar specification that defines exactly how valid outputs can be constructed. It's essentially an extension of BNF (Backus-Naur Form) {cite}`backus_naur_form` with some modern regex-like features added. These rules carefully define what elements are allowed, how they can be combined, and what patterns of repetition and sequencing are valid. By enforcing these constraints during generation, GBNF ensures the model's output strictly adheres to the desired format.\n", + "Ollama is a popular tool that allows you to run large language models (LLMs) locally. It has recently added support for structured output generation. The current `ollama` implementation leverages llama.cpp GBNF (GGML BNF) grammars {cite}`llama_cpp_grammars` to enable structured output generation. \n", + "\n", + "llama.cpp GBNF forces language models to generate output in specific, predefined formats by constraining their outputs to follow precise rules and patterns. The system accomplishes this through a formal grammar specification that defines exactly how valid outputs can be constructed. It's essentially an extension of BNF (Backus-Naur Form) {cite}`backus_naur_form` with some modern regex-like features added. These rules carefully define what elements are allowed, how they can be combined, and what patterns of repetition and sequencing are valid. By enforcing these constraints during generation, GBNF ensures the model's output strictly adheres to the desired format.\n", "\n", "Ollama first introduced structured output generation in version 0.5.1 providing support for JSON output but highlighting additional formats are coming soon.\n" ] @@ -1017,7 +1104,7 @@ "\n", "## Acknowledgements\n", "\n", - "We would like to thank Cameron Pfiffer from the .txt team for his insightful review and feedback.\n" + "We would like to thank [Cameron Pfiffer](https://x.com/cameron_pfiffer) from the .txt team for his insightful review and feedback.\n" ] }, { diff --git a/tamingllms/_static/structured_output/outlines_state_machine.mermaid b/tamingllms/_static/structured_output/outlines_state_machine.mermaid new file mode 100644 index 0000000..c170783 --- /dev/null +++ b/tamingllms/_static/structured_output/outlines_state_machine.mermaid @@ -0,0 +1,43 @@ +stateDiagram-v2 + %% Main FSM structure + [*] --> Start + Start --> First: [YyNnAa] + First --> Yes: e/o + First --> No: e/o + First --> Never: e + First --> Always: l + Yes --> End: s + No --> End: o + Never --> End: r + Always --> End: s + End --> [*] + + %% Initial State masking table + note left of Start + Initial State Masking: + Token │ Prob │ Mask │ Final + ──────────────────────────── + Y │ 0.15 │ 1 │ 0.25 + y │ 0.13 │ 1 │ 0.22 + N │ 0.14 │ 1 │ 0.23 + n │ 0.12 │ 1 │ 0.20 + A │ 0.06 │ 1 │ 0.10 + others│ 0.40 │ 0 │ 0.00 + end note + + %% First State masking example + note right of First + After 'Y' State Masking: + Token │ Prob │ Mask │ Final + ──────────────────────────── + e │ 0.30 │ 1 │ 1.00 + s │ 0.15 │ 0 │ 0.00 + a │ 0.10 │ 0 │ 0.00 + others│ 0.45 │ 0 │ 0.00 + end note + + %% Final State note + note left of End + Final State + Only accepting state + end note \ No newline at end of file diff --git a/tamingllms/_static/structured_output/outlines_state_machine.png b/tamingllms/_static/structured_output/outlines_state_machine.png new file mode 100644 index 0000000..a2f1dc1 Binary files /dev/null and b/tamingllms/_static/structured_output/outlines_state_machine.png differ diff --git a/tamingllms/notebooks/evals.ipynb b/tamingllms/notebooks/evals.ipynb index 92ee08c..6b5b1ca 100644 --- a/tamingllms/notebooks/evals.ipynb +++ b/tamingllms/notebooks/evals.ipynb @@ -1244,6 +1244,8 @@ "\n", "A major challenge with these leaderboards and benchmarks is test set contamination - when test data ends up in newer models' training sets, rendering the benchmarks ineffective. While some benchmarks try to address this through crowdsourced prompts and evaluations from humans or LLMs, these approaches introduce their own biases and struggle with difficult questions. **LiveBench** {cite}`white2024livebenchchallengingcontaminationfreellm` represents a novel solution, designed specifically to be resilient to both contamination and evaluation biases. As the first benchmark with continuously updated questions from recent sources, automated objective scoring, and diverse challenging tasks across multiple domains, LiveBench maintains its effectiveness even as models improve. Drawing from recent math competitions, research papers, news, and datasets, it creates contamination-free versions of established benchmark tasks. Current results show even top models achieving below 70% accuracy, demonstrating LiveBench's ability to meaningfully differentiate model capabilities. With monthly updates and an open collaborative approach, LiveBench aims to provide sustained value for model evaluation as the field advances.\n", "\n", + "Another notable benchmark is ZebraLogic {cite}`zebralogic2024`, which evaluates logical reasoning capabilities of LLMs through Logic Grid Puzzles - a type of Constraint Satisfaction Problem {cite}`brailsford1999constraint` commonly found in tests like the LSAT. These puzzles require assigning unique values to N houses across M different features based on given clues, demanding strategic reasoning and deduction to arrive at a unique correct solution. The benchmark's programmatically generated puzzles range from 2x2 to 6x6 in size and test LLMs using one-shot examples with reasoning steps. While humans can solve these puzzles through strategic methods like reductio ad absurdum and elimination, LLMs demonstrate significant limitations in this type of logical reasoning. Even the best-performing model, Claude 3.5 Sonnet, only achieves 33.4% accuracy across all puzzles and 12.4% on hard puzzles, with smaller models (7-10B parameters) solving less than 1% of hard puzzles as of December 2024. These results reveal critical gaps in LLMs' capabilities around counterfactual thinking, reflective reasoning, structured memorization, and compositional generalization.\n", + "\n", "A significant shift in AI evaluation came with the launch of the **The Alignment Research Center (ARC) Prize** {cite}`arcprize2024` by ARC Prize Inc., a non-profit for the public advancement of open artificial general intelligence. Hosted by Mike Knoop (Co-founder, Zapier) and François Chollet (Creator of ARC-AGI, Keras), this prize represents a paradigm shift in how we evaluate language models. Rather than focusing on narrow performance metrics, the ARC Prize assesses what it calls \"cognitive sufficiency\" - a model's ability to generate meaningful insights and tackle open-ended challenges. This new way to think about LLM evaluation emphasizes creative thinking, sophisticated reasoning, and the capacity to make genuinely useful contributions to human knowledge as we seek to define and measure what it means to achieve AGI (Artificial General Intelligence).\n", "\n", "\n", diff --git a/tamingllms/notebooks/structured_output.ipynb b/tamingllms/notebooks/structured_output.ipynb index 7615645..f82f023 100644 --- a/tamingllms/notebooks/structured_output.ipynb +++ b/tamingllms/notebooks/structured_output.ipynb @@ -637,18 +637,103 @@ "source": [ "### Outlines\n", "\n", - "Outlines {cite}`outlines2024` is a library specifically focused on structured text generation from LLMs. Under the hood, Outlines works by adjusting the probability distribution of the model's output logits - the raw scores from the final layer of the neural network that are normally converted into text tokens. By introducing carefully crafted logit biases, Outlines can guide the model to prefer certain tokens over others, effectively constraining its outputs to a predefined set of valid options. This provides fine-grained control over the model's generation process. In that way, Outlines provides several powerful features:\n", + "Outlines {cite}`outlines2024` is a library specifically focused on structured text generation from LLMs. Under the hood, Outlines works by adjusting the probability distribution of the model's output logits - the raw scores from the final layer of the neural network that are normally converted into text tokens. By introducing carefully crafted logit biases, Outlines can guide the model to prefer certain tokens over others, effectively constraining its outputs to a predefined set of valid options. \n", "\n", - "* **Multiple Choice Generation**: Restrict the LLM output to a predefined set of options.\n", - "* **Regex-based structured generation**: Guide the generation process using regular expressions.\n", - "* **Pydantic model**: Ensure the LLM output follows a Pydantic model.\n", - "* **JSON Schema**: Ensure the LLM output follows a JSON Schema." + "The authors solve the general guided generation problem {cite}`willard2023efficientguidedgenerationlarge`, which as a consequence solves the problem of structured output generation, in LLMs by introducing an efficient indexing approach that reformulates neural text generation using finite-state machines (FSMs).\n", + "\n", + "They define the next token generation as a random variable:\n", + "\n", + "$$s_{t+1} \\sim \\text{Categorical}(\\alpha) \\text{ where } \\alpha = \\text{LLM}(S_t, \\theta)$$\n", + "\n", + "Where:\n", + "\n", + "- $s_{t+1}$ is the next token to be generated\n", + "- $S_t = (s_1...s_t)$ represents a sequence of t tokens with $s_t \\in V$\n", + "- $V$ is the vocabulary with size $|V| = N$ (typically around $10^4$ or larger)\n", + "- $\\alpha \\in \\mathbb{R}^N$ is the output logits/probabilities over the vocabulary\n", + "- $\\theta$ is the set of trained parameters of the LLM\n", + "- $\\text{LLM}$ refers to a deep neural network trained on next-token-completion tasks\n", + "- $\\text{Categorical}(\\alpha)$ represents sampling from a categorical distribution with probabilities $\\alpha$\n", + "\n", + "When applying masking for guided generation, this becomes:\n", + "\n", + "$$\n", + "\\tilde{\\alpha} = m(S_t) \\odot \\alpha\n", + "$$\n", + "\n", + "$$\n", + "\\tilde{s}_{t+1} \\sim \\text{Categorical}(\\tilde{\\alpha})\n", + "$$\n", + "\n", + "Where:\n", + "\n", + "- $m: P(V) \\rightarrow {0,1}^N$ is a boolean mask function\n", + "- $\\odot$ represents element-wise multiplication\n", + "- $\\tilde{\\alpha}$ is the masked (constrained) probability distribution\n", + "- $\\tilde{s}_{t+1}$ is the next token sampled under constraints\n", + "\n", + "This formulation allows the masking operation to guide the generation process by zeroing out probabilities of invalid tokens according to the finite state machine states. But instead of checking the entire vocabulary (size N) at each generation step (O(N) complexity) to enforce output constraints, they convert constraints (regex/grammar) into FSM states and build an index mapping FSM states to valid vocabulary tokens. This achieves O(1) average complexity for token generation.\n", + "\n", + "In summary, there are two stages in the Outlines framework {cite}`vivien2024regex`:\n", + "\n", + "1. **Preprocessing Step**: Outlines converts a character-level deterministic finite automaton (DFA) testing whether a string matches a regex into a token-level DFA testing whether a token sequence is decoded in a string matching the regex.\n", + "\n", + "2. **Decoding Step**: At decoding time, the DFA is used to determine, for each new token, which potential tokens are allowed. Starting from the initial state of the DFA, the allowed tokens are determined by the outgoing transitions from the current state. The corresponding mask is applied to the next token probabilities and these probabilities are renormalized. A new token can then be sampled and the state of the DFA updated.\n", + "\n", + "At each step, the model's probability distribution is masked and renormalized according to the current state and valid transitions." + ] + }, + { + "cell_type": "markdown", + "metadata": {}, + "source": [ + "As an example, let's suppose we want to constrain the output of an LLM to the following set of options: \n", + "- Y/yes\n", + "- N/no\n", + "- N/never\n", + "- A/always\n", + "\n", + "\n", + "This can be done by creating a state machine that has a start state, an end state and a set of valid transitions between states with possible states represented as the following regex string: `r\"\\s*([Yy]es|[Nn]o|[Nn]ever|[Aa]lways)\"`.\n", + "\n", + "The state machine below illustrates how Outlines works under the hood {numref}`outlines_state_machine`, where:\n", + "- Prop: Represents the logit token probability given by the LLM\n", + "- Mask: Mask value of the transition as defined by the state machine\n", + "- Final: The renormalized token probability post-masking\n", + "\n", + "```{figure} ../_static/structured_output/outlines_state_machine.png\n", + "---\n", + "name: outlines_state_machine\n", + "alt: Outlines State Machine\n", + "scale: 50%\n", + "align: center\n", + "---\n", + "Outlines State Machine.\n", + "```\n", + "\n", + "The initial \"Start\" state contains a masking table that controls which tokens can begin the sequence. In this example, only characters from the set `[YyNnAa]` are allowed as valid first characters, with each having an assigned probability and mask value. The masking mechanism effectively filters out invalid tokens by setting their mask values to 0, ensuring only permitted transitions to the \"First\" state.\n", + "\n", + "After transitioning to the \"First\" state, the system continues to use probability masking to guide the sequence. For example, when receiving 'Y' as input, the masking table adjusts token probabilities to ensure valid continuations.\n", + "\n", + "This finite state machine architecture serves multiple purposes in controlling text generation:\n", + "\n", + "1. Managing token probabilities through strategic masking\n", + "2. Preventing invalid token sequences \n", + "3. Enforcing specific token patterns\n", + "4. Providing fine-grained control over token generation and validation" ] }, { "cell_type": "markdown", "metadata": {}, "source": [ + "This provides fine-grained control over the model's generation process. In that way, Outlines, the Python package, provides several powerful controlled generation features:\n", + "\n", + "* **Regex-based structured generation**: Guide the generation process using regular expressions.\n", + "* **Multiple Choice Generation**: Restrict the LLM output to a predefined set of options.\n", + "* **Pydantic model**: Ensure the LLM output follows a Pydantic model.\n", + "* **JSON Schema**: Ensure the LLM output follows a JSON Schema.\n", + "\n", "Outlines can support major proprietary LLM APIs (e.g. OpenAI's via vLLM). However, one of its key advantages is the ability to ensure structured output for Open Source models, which often lack such guarantees by default." ] }, @@ -666,7 +751,7 @@ "cell_type": "markdown", "metadata": {}, "source": [ - "In this example, we will use a Qwen2.5-0.5B model, a lightweight open source model from Alibaba Cloud known for its strong performance despite its small size. The model excels at instruction following and structured generation tasks while being efficient enough to run locally via Hugging Face's `transformers` library." + "In this example, we will use a `Qwen2.5-0.5B` model, a lightweight open source model from Alibaba Cloud known for its strong performance despite its small size." ] }, { @@ -772,7 +857,9 @@ "source": [ "### Ollama\n", "\n", - "Ollama is a popular tool that allows you to run large language models (LLMs) locally. It has recently added support for structured output generation. The current `ollama` implementation leverages llama.cpp GBNF (GGML BNF) grammars {cite}`llama_cpp_grammars` to enable structured output generation. llama.cpp GBNF forces language models to generate output in specific, predefined formats by constraining their outputs to follow precise rules and patterns. The system accomplishes this through a formal grammar specification that defines exactly how valid outputs can be constructed. It's essentially an extension of BNF (Backus-Naur Form) {cite}`backus_naur_form` with some modern regex-like features added. These rules carefully define what elements are allowed, how they can be combined, and what patterns of repetition and sequencing are valid. By enforcing these constraints during generation, GBNF ensures the model's output strictly adheres to the desired format.\n", + "Ollama is a popular tool that allows you to run large language models (LLMs) locally. It has recently added support for structured output generation. The current `ollama` implementation leverages llama.cpp GBNF (GGML BNF) grammars {cite}`llama_cpp_grammars` to enable structured output generation. \n", + "\n", + "llama.cpp GBNF forces language models to generate output in specific, predefined formats by constraining their outputs to follow precise rules and patterns. The system accomplishes this through a formal grammar specification that defines exactly how valid outputs can be constructed. It's essentially an extension of BNF (Backus-Naur Form) {cite}`backus_naur_form` with some modern regex-like features added. These rules carefully define what elements are allowed, how they can be combined, and what patterns of repetition and sequencing are valid. By enforcing these constraints during generation, GBNF ensures the model's output strictly adheres to the desired format.\n", "\n", "Ollama first introduced structured output generation in version 0.5.1 providing support for JSON output but highlighting additional formats are coming soon.\n" ] @@ -1017,7 +1104,7 @@ "\n", "## Acknowledgements\n", "\n", - "We would like to thank Cameron Pfiffer from the .txt team for his insightful review and feedback.\n" + "We would like to thank [Cameron Pfiffer](https://x.com/cameron_pfiffer) from the .txt team for his insightful review and feedback.\n" ] }, { diff --git a/tamingllms/references.bib b/tamingllms/references.bib index c88ffe0..86c4761 100644 --- a/tamingllms/references.bib +++ b/tamingllms/references.bib @@ -392,3 +392,41 @@ @book{build-llms-from-scratch-book url = {https://www.manning.com/books/build-a-large-language-model-from-scratch}, github = {https://github.com/rasbt/LLMs-from-scratch} } + +@misc{zebralogic2024, + title={ZebraLogic: Benchmarking the Logical Reasoning Ability of Language Models}, + author={Bill Yuchen Lin and Ronan Le Bras and Yejin Choi}, + url={https://huggingface.co/spaces/allenai/ZebraLogic}, + year={2024} +} + +@article{brailsford1999constraint, +title = {Constraint satisfaction problems: Algorithms and applications}, +journal = {European Journal of Operational Research}, +volume = {119}, +number = {3}, +pages = {557-581}, +year = {1999}, +issn = {0377-2217}, +doi = {https://doi.org/10.1016/S0377-2217(98)00364-6}, +url = {https://www.sciencedirect.com/science/article/pii/S0377221798003646}, +author = {Sally C. Brailsford and Chris N. Potts and Barbara M. Smith} +} + +@misc{vivien2024regex, + title={LLM Decoding with Regex Constraints}, + author={Vivien Tran-Thien}, + year={2024}, + howpublished={Blog post}, + url={https://vivien000.github.io/blog/journal/llm-decoding-with-regex-constraints.html} +} + +@misc{willard2023efficientguidedgenerationlarge, + title={Efficient Guided Generation for Large Language Models}, + author={Brandon T. Willard and Rémi Louf}, + year={2023}, + eprint={2307.09702}, + archivePrefix={arXiv}, + primaryClass={cs.CL}, + url={https://arxiv.org/abs/2307.09702}, +}
Table 3.1 Structured Output Frameworks Comparison