From 4aa0d970abe2112ce544b96696fa6c5246fa109a Mon Sep 17 00:00:00 2001 From: souzatharsis Date: Sun, 8 Dec 2024 14:11:33 -0300 Subject: [PATCH] update arc prize in evals --- .../_build/.doctrees/environment.pickle | Bin 1606771 -> 1613013 bytes .../_build/.doctrees/notebooks/evals.doctree | Bin 317779 -> 320003 bytes .../notebooks/output_size_limit.doctree | Bin 86075 -> 86142 bytes .../notebooks/structured_output.doctree | Bin 140893 -> 140960 bytes .../html/_sources/notebooks/evals.ipynb | 2 + tamingllms/_build/html/notebooks/evals.html | 221 +++++++++--------- .../html/notebooks/output_size_limit.html | 60 ++--- .../html/notebooks/structured_output.html | 80 +++---- tamingllms/_build/html/objects.inv | Bin 1020 -> 1021 bytes tamingllms/_build/html/searchindex.js | 2 +- .../jupyter_execute/markdown/intro.ipynb | 2 +- .../jupyter_execute/notebooks/evals.ipynb | 2 + tamingllms/notebooks/evals.ipynb | 2 + tamingllms/references.bib | 9 + 14 files changed, 200 insertions(+), 180 deletions(-) diff --git a/tamingllms/_build/.doctrees/environment.pickle b/tamingllms/_build/.doctrees/environment.pickle index 6dc259ec68ee413cb27c4bf3462bdceaa1a21433..dbf3bf1e35765408ce1f5ece10bd036e82107922 100644 GIT binary patch delta 171186 zcmeEvcYG98*FQ5!NG7vMrffn<+t6!B=p7PT5)uf#7$5-xNg#z3s*uo$f`F8RA_5`` ziWC*sf>Oi^NJl_KKtx4piqZsr=ib@fxwEsu_kBOl@4v?n?oRG$-+S)4=a#uM`{#m@ zCqEdu{p&7)mfVcOaijAm4jj zTUZ0~<11H%tyz^t_`Q~2HRwCdw;SZfSB@lq4qw(SvFozU3AL9ENT~LFLRHw8=~r#p zvk4K)@)9E8*hjwMUQ2R>WwxXm%fzHWuG_NeNl)SD#H19?Z}$Eq`~D5g*=5IrfqiDX~5gzN<^cWUa<9vePy~Y&*uc zg~vzvW>Eb&5_%ou4?^z~e3-rEalRS{O-^DVdSpK|IE7a|Px3!N<8Se5awAoSL8thW zF!?keZcqG{kKrKa3{t3XUOCNw2~VBHKj)p{Pjb!R;w&zd`f2|Bv%G!&S+7@r;r;5^ z8-MD(qb5ifyuJAr%^>c)cLdMb?0c_x@8`Hg;;16H_ltL!pVuYtN@YGPykYG=z8qfr z)%$&}l^x?Hy)xMU@D79TZhC(QUbpe8>u@m?uHEwf3Lf6}4u^vy#8CV7KfJ9R{P7q5 z9eLaP2RQjRGNN;puRuVysxzF5dMH z;@}GTyX8IaL-6|pyo$W<{RQ0m2d@&wi`5|Xf%j2Jc!Yl*%*8*W{_#ExogRCK+XEhY zx8+tyKH)I>k@vY34}HQQ?XmYx_?7bsUlHwd7;1W95w^(@5;5o~yASUZp@rjoj4k}X zvb~d+?}Fd?$-Mom8@>%6m-h0iRH=VfrY){-m&S2D5^W5hDqp--9QDn;>W9ynZDl>OvW}&TVOw;DJdKtFd?7ZyU zl3ZKSsQkiA!qK`#ZCiE`>YGrKU5NUM^KFx|^Xl8Cjmj=A&77W|Q8=ogFneldVP;Xu z_~Ig>Cq3FWC@Xt>rfqa)W`S)yn#;@18*9r#lSSEM^RmZekIKj^9zVsFF?wP~-Y8U> zoreZSW#$!Y;2PNqO3=DdJfonXFh65dR%Vf{AhQr_lWQw2$j>iCJ^6XIVozR(w)M|K z?D%hF=BSL4qD)(HVaDj};_UpqjPY*j3p2-OqiR%zR~f|wa9-vl1YBH#f92;Dk(w80 z<)adGa(+o(ab{r{LLXgHNO(~(dJZo#@*xwW6kn&M`B~x#^vQt z8lO3OY$k0CtS)*K;S)Q>8AaoYqDe#AQi(gY1}Q>6pwl606>n{jmp?Icyp1$-VP-~A zKDAbyIVvkJdqPQOQ6u8$(G*HPgsBc0EOxf5(P7wt6OqrB zIWc3rR$E(9X0dHdVScWySu(#n3OSvc)B<% zqu7=)er$gI!tCO#-0V@d>|AWPiJ7@rUqo4$flvwwTb?a9qo@cY5*DrDtClvxKc`?r zX&UownUhClW@;6}iWFxQjzvgXSdvio+~CVs96Vi#{+H!DY^G8DRHWAB?<;h!m_+dMrMT!-o;m{ z^dIQ0rT?Su`v0%D#{G}F>wkOef2_Cm{m=APLl?P~&&Po6wqGy=%=8oOx3Bx<^Y-z# z{ATgo2omv)R_1v5a6KXFw%;ah5-F(UOs z4|6KK^OxU`{7S78!n=R_)#lg5L0lstf_p{pO1>{+2P_$x0f*WN)u7!gLV;hM^|Z&V zgcEoC_7DXGVgM~AT`DvO4aEBRsP-#&{RA)W3m3|OdPLI2(`T!n)gJ$k-))XR=|Iu{ zkzW)4A6#7?q^*)w)ZKBB;##@YHK6esUKK9$p zom|mbXwS`EFBLVd24Mr!UN5oFvkm>>(S z?Ds@rEXNJfSo&)$e4j3`W3^8WxELV($>nLZac;C-R>U$_qs?-or3VY4+)Enm1vlCX zN$AJDqtUkOwDq508y&MFrj8tUTq7TGkPm5a>@lh!a@0%ByOgV&(B47QvXVWjAtoR`up7-pywnXc8VM=9r5n@Tt0R$g zcUxm2iudaVT^b7$I6s)#N(hHzjfMJrOgGpu*0&kNL<=He=nyS5cQP!F7V7fxZVY#$ zg&t0ZVNHZ^zP}qoY@lD5li}|sLMT6mVL-x~HbOOGiDv6HF=*DAWj~G)9&>#YlDux@?v}-1en-0P$KD9feq$Vnl3kemIJ>OUBUo%buAm z1abUdn$rWG(E&TUl>PG*p*qn+s6ubI8c*srzJ$=*T^ zmkpg$F^xDo1_{}5M1+8YlM#*@ZDLk;!R3+IPwCh3gq z{!pQb7hK5@WU%!V0-(omp%EXH042k5aw8|e-g3V~|Lm=QuS?~?#8 zw-KrmK?xq`^ECqAV7%g(?fpCg6CJL+X6B+(y+>kwI>LofSf7@VH&W>Cq@cS8I?J<0 zVXk*bygEu~$BlGSY@?}Q`e;`>+TR^5EabSU5I08Xg?)U<7~wUJdqy+YKTBxEJr8Bs zDD?~+%o2t=DP6Ns?W<18o@}8y|4f3`oh>=K^P1!cNgTI}C}*Ab$vC~e>}SUbejIm@ zT3nfn4m**c4KqV?v55{Qz;&x{HI3exzrxUL2NU!xzp!#6x*Ni*Tn|E2p3s@U%ZQ+w zSMo6ZqgJv;hF|lrp)83|B_C&+fr;9n=%2_8id7S}LGeMpP=|<-1|#1zbLAG0s^VO@ z9vmyck@-M@(2j4FsLhH$(L%U+BAFGZ@!b+(!d9OI_iwHnlobk-_$i6n6nJ8yHdQPF zt{A7llZ%9g{1OLk5lidJzwRi0l`XCbXJ5w(e&V2gq|x}IL?9Crz5vTzb6kW(=D2XZ z1jo;x9TJxj{wkjan)%lz3UBg{9Fzwv<=7-)0ap#yPj`+-izf?W)!-yDe&D78XJZ;7 zI0@trd|KjggY5O8v$t<3G@c^(;{0w;nPBCSK4HF~!tg0Lq3r0UlTf)GTofEB<$QQ0 zi9y$IDh6Z*Y@I69Ptu{E{SXntiOUoCVHv+3a{x3y@^7|*zePTdr*-uw;xWHVaE-Gne#AR zw=KgigZgxb?RCR{cBYWW#X{mNAr`mDQOM!5CeXedeYQVZj(|S1F*!BtsX4A*PwExo z0E6ek=Ah@s5DksArWxg6Oz-LN)Evy&(;0@TIs-QyWBcA*VLrEtFm%H4d+$6ny1b_r zDQ}Qc%~`^J%~?p=TPCQ;Suh`GR3B40BB0lGscwKSiKZ+N2K!&=iAjR)bis~RvI^H1 z2>#q9BCXY&Ki?DDt;M|jYD-yxPs@c6{;Z?w-|G6DRYzkr1jn=hjjxrgWto->g=juH8OAKc>=u)(Wtm3FOqOYztYw+xMM6CyMzail z(=4-Rkv3G(`COaD!aSJ%oUlk9n{1ra4YY?Y#zy6{lcCS&95%`>U(xS~=DdNhXbH}6 z5#7z-{HoXkmI~{=`Lbj<+SV%uF09b>lcDUrmBImU{y7Kb;fp%O9==-0^WtA})M@j} zI1;@E$6v<8xha{Ba5x@>y@D~cZ7ohqUV@Wrg-;3lg%NlY@11p+8NWzgwm!2SR9%m2 za3sW(`&D)6qZ=jX&<(gz!k^a^@@JFb;YQT>&K*>7Xe|yJC`NsQ-_$pcUeOTFe1lgZ zVuPSy{)pXxE81d6*&qz%15&8TqZ@F5@=t+l8*od)xr6i6t2kS)tubtQ6&Y%zXgfIH zy(&!Qo26)tSF{mp(*a)CC`9s|kO7Bs-*ze5=FOr_LJhK2k^&Q66QV55u-LiT2}g&> zi9R@mK|K^{*z1TCEl{NGUl&>s9Q2QQuM5-p5)DV=O^Bl~1=8N+dwJvJyBTC8VXoP> zNl^Ue5cQ!sh4!yD31@l!(-iyOt^7r=@K&u`#4dMi=6cgVw5=WF<$6JEBp(7FY!f1J zQh8<@I=50UxVH^kO?J>y-{GT>HgLPp!ut`LfQ-}L!MxOK+1_LIxtHO{cHv6t1l&8y zc?Y*|bHd0~HS&>pEquw9CK1dn$p^ulQ%tUIldHMp>TU8t_T)6! zzVUcPt3p;pIdTvDcu20AlPeqfh{DN7P5}8R!Ze*Tf_!wrT$9s;eEdN^&XW&X`R0j( z@sielAaTL_CWsZN||A~w^BWhP>Co%oE2*g_{RHxU^Ry=)>f9@=CgG9G%{L}Wa)%bnPQ@zCcc zBIBVWCL-gZvnC?rq01&Bvgrs{6uo13Y+pZJbus_rO0!A#XX#rHK+byx9eW-8;WF=i^`s{%8X@zqo_ zmGRXa6SWoNtLMyA##gJ%RK{1Yn5m4fwwS4muii6L8DH%&QyE`2aw z9<(4&m6qbX0Q`Qri#Wm;w;&%Si95ZC$g|F_V)e7Vn29wgjkOp7^(N(FfojSw=%?fc04sQf`fXa z#QJtTF>--}mq+8}$qcbS2iG!@A5V)|L9C`q_L5Oz0}gV>qSP0o#X1~}&BjYSG%}cj zqd9nq2S$!@U>lE@i?hXe4qna0%X>LuT@LEz<0YOJN#mf!1iZwPB0~{EAsXwQFaE^A z)grtcHbLye!GIF9iw8x}m4#ve2fHUC$yzMNaj<c`Ljq!WXyX%9*FL0bz=_6K6x}^w_)GY`=u*IM zN1#r=RM)=toxrq)4wWmVpw9Nv9|E&g0YA2l~Hx?*|o$l@D+n-sy*>69N<9`Taq^D?Lbqua2rz zY5MY-^(2AL%z2<)k2a zyC~tZWOq@j21}}ok{>2b^m8hxUt9XiNrB66`_;D3Z!E<)OY9GtOBY-8k9C^d)Nfy>04hpl7V?Ubon`y8kF2Gv)A3F5&e?9Y4 z${tkw|9wQy!%$~lI1VL#5G89dbKI3jX;{PYHDa@NH)!}cY!TL9Y+Q83Kr5N64 z7;tOE?y#=JzcOS_Loz=M;w}V4!j@^$@33)}6h^xy9Ox@I^bQ|}lZWs3O11gOVR%#_ zD3cFG`2)DM)47%GgFCVE3bCG}dqH4-xei3nko!!^lAZp2rflkl*TgBU#ArIt#*w^&N%n>dJ#bmGM(X!~od;UjAt z0e7#jkM? zUvwj$S}8T<-*pgo=tPRJ_i%`P-;!XDT_qK8{0RrwQ3uyI4rIA6O8@Y`If%cw5r=O; zWc&!tcaPnOjn^QuY7Sz^2+em+wc#&IX?%=>*jOhTkiD{2k~luq!Ik3RN*Y0ZW}mTM zvU2=b2X(ZO3KZH52XUGkkwRPHAU^L#?7mSN$Zv5FH|svGI7O4ea&q1uM6J5yG__u=e zCc*wB>Nws%7Pi$2^o8ATNeYj>3QMf;^IKA$Gbh<6Zj}N#eo#7G{55DWoZqJ9kMK=_ zk%I_w>ahx&S*k!r)a?U;Ws#~@FHz48v}zNoI?Dv?DHUGmTwLP>jDQs!KavfoNQC}VyrsZj_pMSyBRh-I#94r>hEH>wGUJG zQHFs~`m~Vhxj4T26vO*RH;x@6f`Z}v?tmb|Mib!e{aOMfV}QN>XObVsKX70j{<+kd z^MZ^6(qOprIkF*v)+p)=X|RiH_g_*|yd`Sl!t-pMVue)L@}(3&n5i|Y#yJG^_)2<% z;~EpuP*^f9P=<&DQanD{fd%9F#u=Ki9}h@{4yKhUiUP9_N=>*PE}p1E(lCRLHxEh8 zxgjn(1w8T_sk(t_^EXm^Zk&-R>#&q!V7hb|Yli3R9B5p=b25nS`<{?4EQJfm;E!k4c-j*NMIqNUs*4K;z>WtSd6Kz$YUryhF~(j5;p0A~C$A zKnj5+r-Lfn-#;$l0l`m*XfNnoCBPqwPwLGEn@>uY@r(dcL*c?cNr8_}Nwv8XaPbuS z@u0(xqrOGIIGM=Q;^w!S3fi4&pO${&f6s9Bq$bWj2rr+Ju5$k{rZ1k=T2wJzLz5K z^v}_YQZh6=?_}$JUit-ZiBW~=v#lch@q?st?KHMqKRDTlHhZTFQZ0@fz*ygP5fg7; z!kn&I#mP}4DEM7zNs*A6(J9F3OVUA(n?NK($)qQqn+9G#NztUuf0LTRk->6va`5M8 z*AfD3SEQCG5%-G}4CAjzVT`m4>xKmT!N*smhIs$!*DF#Cw~o>C646r~;E`0{Ibrhq zMQY7Wu%oh27YsV-+xe^Bxb~~RN;Ns|GYzHQx`#-?cr2$_e#9!ccu3=Z@SCP0_jmO5 zPb;rU#jXZjaZNgnCxd9MJAZ9eVD)wELJvv7VCZuLefO89blMFm!O2K*h27L6#@^?q zw3*|BM`^i~oP2WS(g}Y^hq&4h<&QV(Xtctgn&4Z1N~ic{nh0FKt)*%@`>ghtG==Lz zL>#GrrWHKM?k^99q`y(T%P1`$6#p$nIyot--wD-`!6E97v=UEMP}AxD7&#L@)dvlG zx4Tja$4?lgh54g<5)E@248874t-YX)w;T>-K^Up;JjBlbEL?er*=7;Zr^N$bj_92e zVECk_Oj>jABWWsB@sUHItcyh=O2V|+`-hJ;Xw<@u$C4k38m%={M4;@-NA|ox6vP350-~eF zmLNGAM;jL)n~a)zH~`&f6DS+FS4(muJbvSuJ4r@|vbq<_$X!$ccRNM4Il0Mx1}spp zLFdCqirknhCqhj@x)jhE2k(oaax7f9Bl??HD$5Y5Yv%BSDz_lm^})u0hWeg)tP>aajBJf&6 zOm&Mte|@wz<3CeXZqAFDbfSlM4Wh#3c%4>FZot=Y&>|^~yAJEB%NcwdM`=r!1eyGC zzYz=Z1QA(m!2B?|Ge0sDG7>F=HSsj)QwL{|%fjV%`6&(^6B$wrSR>_?{PIj~=JZ_@ zI$}vCq_mW(lkd1$mOlJzsIa~~5gyrO|L`T5cqvUwf-?6&5=%ko7){w{Gy4zAyj^WQ|(q%9ndPk06x(bFt(KHE=n*-3WmfV^< zMTqFmbkQH5GoYSJ%LM9&A#kLLoCK3YX@dPA`Mb1PMglvldW7QxK&RM#V;%m1!;<<6%LAcJqe~al>Ifb)mQUC zC)nCtt^)1r$$k*lK<>n4xoPw6GzRCuQS&WrIJ~j_1SNP#qaj9Mxtl!SX$;(olH>3h zZbQv%jpeD_7FgX_Zp&>YGLBL94Z=qK8IAdLX0$vPPw8Nh!;2eST68bY=PpvC)Pd(U zhVL8>tQjkBqyZFXNtq~AI4<_A!gIKHo@hsWB{mDG?-h;^-`uuC~tQq@QXpo^$MZR9M>;P$<3FrD#h z$7(w%)w|)uF`ylq+6=ARA(l5O7RU1D^|4P{-t0ujIFeLl7|~vCNwm6>6#0YB@2wrQ z1VyLATRO_+@OuY24;~(s>T*ZzvpUM%IsQki)*3OHPJu1tq?Z8RO_nqZ8y!rbGjqDO#=%#6>!)b7L`GhygOhd`Qg!^F--;*Iv>M2k1;zBesP0pdc zvC6hANbW5+#OI}`9N8u2BeIyLSl5p>1?|maQZWaHWN9PY_Eh;pr@os-RvDlV27a8T zr)KUfIOI=lLZ$yi;<8xP)>Di2A&uw z$8c*M7zQENOIS^^4fw(!c{ZLRrQ#HK;$U=_9ZCkv^|%ih&^uYsJJRGRh*=>D&bjTv zA&C4fhwF|Fk$Y)6#}hjg^wyyo8u~}%Q}PGg4To~Faaku#u7mHeNJ*13@r>e*G%+&D zcXa+Ja>i;oeM3h{&P=k-dW8Pe!squaX1x?NOzz}WyEPs#Gu%92kC|ywwX?|#1D|nt zs5FHy(&c2ng`=cZhCI-_RyH;lIan6a9Ir}p>f`Nqym8u4`--Mt^>RY8VSk2vz4SV{ zS3NsP-jIbBu{%gP~klsvaJ0Z~o?ETG2AbCOVM0kyV(T3SG@EXa8Q^_ElZ z<*3LB2@ktCNGse)c+$l| zTI){2V=fNTW_J>vadD8|aVOyc7YFHMcarXnFWgDGH;%fKbZ>m;Mj}Vb9JYRPC+Xg} z;ZD-ManGIP@&@mf16;OThiT^)E`RU=W}?$0^*A%lJ>~gLe3;9;(?=_s;~gpwmBf`DtdN?x4A5qVAx@ zW}@z(7tBQ6K^sg&a(K+)ptsCK>Y(-ho~qeO_s?FlpdLU6%tYNmC(T6NK|h#@x`Tez zh+HrGt0g$Rak#C$Be)UVnJ5QBm&N`GkTOY5w)l<1#5Io$Zu~e{+StE6xmV{?nWVAi z_;i%4_J_`s^W}js z{HScRtTKt7FGEq+0XE#SC6@6~GB^1Mh$$4?z@##{mE}W|lD@N03BGu%9rT+cx1q~G z{#^>bl_qxwU&>`UW0Lz|Hp<}}x|-kulGgY*TjSxgQRuEosQ+9P!uNPZtehiDxLHR5 z|G@yhp;h?W9K>3EJhp^0w5YGR`6xbpF0RI^z>a5}%PlLpj>icZ_0{#kN(W1Oljw#8 zD2ne2>&&%bL|ZbVvn<0+QbVe!^~o#6n&5g!H#nG)8VJu?LxSO2x$J8xGs%Bb73EW& z!F`wMjQmt8&)*oQEsU0&6a0Df-h{uf_mmyLr__&VRXOskT+Oo8q%`0;tO~xnD**P? zS0#Mj)`?8i^d4dK!k^Dzqq4CCRq=4;;&_b3b0*Dig(6&hFIo_Mu^8Luj7zgWHZpax zBOvb{S87=vndEPTq5QVR7|RbB(YprGVM~LnS!}uPO`H9^+yEXnQN#G~T(Z82jw?T z8g}P0>+mjX0kUf2FDWI=S4a3F9Og zrj?1TQw@!jemS5Cl&qImS@=9Q(x=JYsyW6MET%)2&EbZ}xy(5Y3Ewe7&S^a1o9>O? ze^st$>BXAB5KMsdhQXEbH6`@aMnaypOGiT^)5KbKQt;tNy7AtBP3~xU+RQssHmUXA zBzLzgGqV!4OWf3kY?c!&Z+PZ?UE}5OIpS?Ne0Iyz&U>H(=oakdH{>+SanGVh9A*wV zYHHac*Rfpn%=@z&@1rep0}GeWv=Mb6^~xhR-m6=Yx2i{8yrX>dZG`QFKv|o3W^F`S zxkI#~+}E&vx10zaw#$7jy*&%ybAm*7V!qapiw@1!Sn1ZMqjn2+;^ggsXNfO0iO(Fhqm6gTGwTmVR_cg*o>~7k zvabFFYbOjaSY0kYQh%v#IE%GU02RcmEvwE#L` zXst-=IovbrQ!Hz^u)^DMA_T5&k=t2HJPQ@FLea4KW30wP&#dJ}R_dG$o>|wqSoIoD zw|wfEd5^~Yv7?Gt_h525>zVaiBP;EOw>-06H?p4ChtcOZ!BbQ6J`=Q@=S1-Jr|5)8 z6K_nIS{ZMJ)AWd}dd(7{#%FSWODB^A%?IrrB5fux9dV#%-hLV{PBv2y;`?b^e1X&9 zT+ggIZfYBSi4~mVnRlieZ;h|8f~!6AuF`qg8cz6HPPKgGS>%0($WFF~$2{{M)_A#{ zq=rvfe)Y_C#ZBX+Ls+>=g&te)A*(S4^1eZzRq@PfE!0{Mfja7jqeg3bSPM^f;B_o) ztrh5}oz=pTm2TCCACc3US&(7CMuuR#@45MioXW76=X@QH%AMiHDH%^~>C4^9+~Y!)0W|Eu|BgI!xqz-&DP_F8Jj`w&~e#r)I{ea=S~-)?yYT&f^Da8=J+gI%!LFdN2bJ@W}^n#G2hBtjoS2Dtzos1TK%iG zyQk%;Ms1{K5gtIy)OBl*bQNrQ&%YZfWnZSDeDbKZR)1@6? zcm9d1mG{_U0=7X)oXIjPOYYTZ$2d}8$sv3-#G%`AfKj_v0kfJdx-Go|5oSK70*0Jn z4kSa^(@`k#Z~30(oNj0w=soZ@v*$yGFrDi4sg%A%ZeL{FSCi>&#@!Xw^*AuMQ}<{ z2y2H?Nb~wGPMPmznZ~N6dq@R7N-|EWKK4<1Smv`-x^pzU_(?lQON(7QM?UbNveJ&( zIU48y$Xc}b_f=|eMi4Z#Pc~E%U}%s+XKW;@lTuJ#--Mw%Y^)}|2Uo`T;umMRUb3Za z#<8qg=ef{GiMD*e)=%p^Pc~!k=dbwjM!=-^M0$Y7I$zs+lIsmfCDF z9i0lQD(pU=L2hwX*)WH%^P+QPiaD&9 z0Cx-#=1ySf>oj4vs-s-uEMMy&ah%#K)r5q47w)|$+L{o^)nkHe9iG-Yo%0W9$9A6Rj(%W+-WiL8eS z*}QM{1{3!f0@N@ z>0%5rGoK0B^e)OeOZX(Wlu|Dx(S)oT!{kaUuq232f{29R$`+RHh&Z#75M7CmGs4Wr z;MCGR1a?;{>?|~u6n0m(@O+O+^kEsUJ7nZ45lk}a`>B>px*8EUlF5O1l}#pb?o2e9 zbnc=C!9SO*HG!}{vCAJ!QmS*7EM1Lb{dsZ%4m<^HvDR@$UUl!$2HE$Ml}1Kw`Y<-@ zf3Gb?S!mQogZ-smhSATEj;M#Z-b#O8ex1R?k*;)|R48+f`}k{gNTijik2}DPB-Njp zdcEk*M18y~34OeMkP=;ydui6LKOV%vN(RGY&f3f8;r$5PW&Z%6CuI)k=9v7AFHD{~ zLuzcy6qZ@I`8V^gNopVJz>;5P2D!g|*Us>vrRT2QhK zuy?w(SEL6m24u8Bo=Hau)v5G^WDT5y)uY`HckB0Z`z zTTGe((h3!S2981Q=|W|uQ4_|0if!XObITcUP@nHAQYHrS!zZ(8hUo-vzvcA#Q#vNb z=j+l9J|FDPM14MEw$gzCZva8voeMwDR)#PvW_Qn>qqMe^>+0E{F9*w9gS+Rkd|1sM zFlVlEk6%+!Ess5L*78MNlBwlC^A)zMgF6$gM5GZkZ4+C}YQSDsZCdhx$i1mcfGCx~C%^-~bjo;QTyPdd{$ zh~Kkqs2b1Y*yVI%b z?+gqKf^~yPM`zd;_+~_ zzrPACUshUF)ct9O_yyo)Wgx?1-ekV}yS1rhjgFhi5KZmwTWd)*cB6yc#HmQM+7@bSFiuNe~B2MiNg5ZT{+O$p*HKkLPg z&6l4yDXEt0Y%zfjOEw##k(IM=->h8cjH<}+6j{$hbua6hb+9&1oD5=|P=6cH)8@&2 z<_5jHMft>{Om$@$T)AL_LeN%21q^buEhlYNhH{oVtRQX6N84~Wsxezk9Aazap>nLS z-M(Fkchg6`;z_Wku3q;_q$f@r!%5oly?2x!c}p+7)X|L6dx&}fTTD?$dqB+!+rN5G z`NmBjK@B%QFoaW2U9%3@=1DsP7+c~)!^VZ75oosfcszW#B)Aa zkus9W7RI~e&DB^4*rhbLJfj0;vIX}OEZL=m(gA9Pm*WhVHnE-02-x@TQhwqrtC%7! zm>z#(fXB)JInZGrvs<~qS+=l(G?>EY)XNZ+x=G>}?*ux=23?+cvG8A|Bnm!5UK z#;PNeU)#T_v!D1J0*s*tkH`doy>kt>1%nFEhaJ6 zu$zb0VTE1ug@rev?0Nnc%s8(E7}e9b^Q7lR-I`v3NKc%mT0H-QQo>n2(-k;s(fp#4 zW;w_f(^S#JLoHZgyKo6_MjQ1}mv8;i;LUxyW*spET{bY8z@}X$;dD)xb%0&{lagS$ z%@)&c-TsOp$_#R8SCl`Eny7aZJ=E{2u2{z!>50<>c;GKe1ZN3;+C2}g{}p2-j4h^s zldl>88|21cRqh)#QNQl~&EPUc*QsL#+wTUw3`{2Z*t&SAyP;>&n_A!fuFT^t@wy$y zob1CJ1{(%JIvohWhnj9EPd@iK@21k!GL+Tr$~R5x_H}KlD8ZtKN)-R8g8^Tr(42_oX2>@0x3Sjrr3$6{@T#JhD!fX3GlqaEAEGIK5|>q4$NlaGhw<8@LthfE^AB3D3GE5DGdN#wXfG5HuqZd~}12kbs0+mH3} zsx+q-*(T{p3TKn6TynL92rnjA(}}{5$yEx;3%`-8{sdM40W+0ck-*7WL$3PZmHt`= zg1($0$M-KF!g7ju0YzI*k(TE`DOxJ05EoE*uMO97_l~Xk36iGQnQBDz*Q|sl_ za(PKk2}TO}(IM(C+S1fr^ub%|E_yPXy6Z6kOx=}ISc;d_Uv!>M{YAI;sK4kbC+aVH zFY_+B<$RfZ5YOe%J$vdnx>-yeM|TgY+vcZ4$_?zE=5U#6O9x?kp)Xu4mXGtqRvtTNGb zzu<4x8*w(({j$YG)BW*x?lb<(R9Dy zJ35VZZ>IaDGQE9jdgFX`z6Oj)BRFRX{B1)$q6CD34L8bDY>Gr9avAU=vxHH@%kt_N*d39fs9_f?e zWLlI%AM_^Ei=2446{M!&8`j8MB*@bjJ&L6#T?5bv~#ctiCS($gp9JYT&a$*q}D;4H0!i z)MnfTxb?i+8n(YFiTJDN`dcCn+wjT!PAyblkt1WB!z*zSY8^OS6`yj4FZm#N?-c$p z^CL~eP5)q#Ty>yt4dCz2fT)S$0Q$BFv8KCz~YF&P#L+%xaG5R`=fL&^&_=USr?YoPF?C>4J^*&+O+L3kjnsG(L7#WKBNDbho=u=15_s`vw1aJ0S}kM9_bs;9>D zXp+7(kG>lM!Mk2wz+=s^l?Oua-qtE&8+Uyz>mUyN0jV(jX=g&b4%K{%CM)Ul}GNxNRxMr;(er=Sg1F4M=!((i0@{{8P@_r4&-ZHJN z*0u}@QR-=F;dgi8qm5B`aw9d0`&ARZqSbdV*cz*4V&F`@m9YEOhT~1JmF_iGqd5`c znql9iFU#~J*YVt5xD|~>q@}ox7*VZi-wf)mX4acWLb!{y*ZFgZ^ zj9MZNcNelZMWMV{wI;raL>C%Ng&g+s#4;A0fPv$ zVV5TU*mDev<&;pz=~+FEzsCsOO16gB&$Lh{dW&B$#U%Sjq=f&w=yz?^F<#Zo24e>6$PJC1;PmfZ0x-W?y`S)Ir6cV;rjH+6?OPh8z*G++Eo9YRk} z1?yUGG4vgus)GQ}S+!&6te4Pg1f-5o8^W0`>O*{s@q|RR17CR->|GET&NLRDr=kO8 z*epH5sqHmA@SvA&yBb4B-L0W4EwCPRJgC-HI}#EdK%IxPa2DHL&9dq>bXsYL_EW@I zu+0q=;m~hqn$^Guo+j zdOAwCzKfw_q(nmco1$d@p^y5CU+{N?Mo$e;ecEga_JTEwaj|adG^5Jd)E(dn_GnY% zK`cCd_>>yWxWvubJ?!(+RQzQQel|U`;p`U-h*!uNC+h9T3sM%5KNwYKwxLfB1$rh*_jFB$PCdP@tyO~S>1u=EctWC%rs_O9TDR1R(WE>2 z3FfGy@$^*?jA-FJli|kBx}4_h-$$x3yf}(6g*m$yG>XULM|!1Lv6&^-5MC;UEtzUJ zhQ&0nkZ174X^dhru*5cUA6Sy;X)MPS9{od`cQPYJle{=q{ZK4tN-4^SEY&|mN6ARh z33~#}%R;d=jM#dg;F^$^t+o5>fLk|(X!v@ZdO-Y4F9z-Q~}5Mwa4iRG@p|&>J6UbOq`&j7bBl}-NS&yF`>>FQK*AyZ1@YS;`zeHbbG(=$}wsE2ldOCF%lV${+O;EAbJXTqiW z0VUc#uS~7aiEruUT2DPUQ!NnRWlE{be)M3M72Q8ey<^Zt(Rz}4gHcVUsjwz*RFMB}Cir$*19x~{nXgx_CV^mYLo|w#NKVP6$=fsDM zFhv{ij9M&u&((V(je@h!7@~lcy7!Fwvq25*1}`o&2;67P(Cp$yXW(hs#Z!DlGD>J) zYP(2H5bH5mu?96^`yw?PI;_|COziT#*T>d%~|2KTU1<%$vDP{ZlN~i?yD# ze!1FToW_(=biGy>(6LgvD{!BYEuiSGdO$aU(MrS86O#$2y&I?@ocIzWOv7@)Dm72s zz?QyX7?)Xrycg6f1`RYK-+IwdsTUYsG$I368|Vx&Eh0U&022j3BXa9%jL0t-)d+qe zeEo_ZjMqH4iWQ*2$P~B*BTQdonA|W398<5K)~Gkc-;5Ay;P3C^u=58~N*nr(wT3#g zQeUlAzcHwxPQ1C!An*&LkT@~&hKJhyq%+wI)~iD~QJtr!7uvdaUO~52XG#gW$eSLN zF`{<+230gFqZmUs8Zb(XTH4T08|e%@t)V^DfpIf!XnWXe>U~b^$mqt2F*{LYCCF5W z>kk_@soj~$jN2qA7GgK!+;bqKl1_pOVB2QHB&Z+5L4)qE&FWQgG*e3E=#iJngAQ)< zP*#*~Ld|~@=UpX?C^6Y|v1zSv#e=d!UCc4Dn7KtAz=`F0Ib3{89WE|m3TVUr;z1iD zWw&otI~(=ThE3gWXxMp-S{i(w)XvtK>_@h%XE`nv5t&Cb4e1y*AX$GE_r|tFLQ-e713?p~60X{mIX;(Xl7noAoe;?Ql z)>x_P@2mF=YAC$%9~cBqF$yWXLq<9SPwN|=T+GA)o%G|2w(~wTM7k;L?jc-b4fxRvA|iSLG9KxvyG-_j*);zKF#aTd9p1w|~Om(0%m; zu+&$d5AmXVv=X>+vRd-m+(s=|q3ZnC!SPnLDeY}%hxXNob5ES)=CJ*mcst(a1h z8S5u_(8!45tjh5oGU$-*l};Z;H>jrB@~LkP@zj_xLbK&Aq%+C9flhZ!y8Tb9FNuTo zV%Tw79W9Pv3TPV7J7ciJ3M@KBG-r=1zujnh-br}7gjtaITkXPxsHAhgam9~fLd zj-eZeF#L7PnfFwc)9x-Z(WNV2RBjrzxM!SFSZ#Vo5yu##$@z>)-8XJbEH3Js!@Ye2 z;lKm636mvQw~*}rdhLHyPTbAtr<1N34=e6v^nEs09^zc>I3u=nowWu${|J-8cTDMo z-uME9Ih)b7ZmS+v^h3_t#-NMlQ%_Py7}X@7+B`8G(wVfBzoN)mApWD50>vtE)~2HO zGkW(TSYb71Wv7I!6otta(0r848wBn#W(Y!f%SdOyY02JG4cJDaX(q+Xn!}6r7_hXJ zq{;@{tQehDcdKk|Ew*H&=-~WDWoweyfhncqU5c+k7b``#KbQh$`{PAl>%b>Omp#B7q?<&>| zgD&dN#jhCr`6;87`tzod&Oju+rxonkgin;XH3}0KKyOHu^$+&87g?>3c<~lvfM$X1 z(bo3jJ*M>5Vr%b+2yauuVkP@lwU(Jx;cqG=iQ=<*LLsUmpR8&I+-w~fYS!gn9eo_` zrt9Be?W=BWVo*h0oD^DNU&GNW!M-EZy2yub_iROTk$v6J8FX^in!!T{#*b?lRJUO0 zbT0Bt4J&(|gl!m_pZ=&}jTX}wDH>lXHLWehQA{b#Pv>eHY_U=_KQRTy{FH1vHs0Kv z3>HW7_`a6)v{=TNrF%R-*0y#K=QE`=*iU%S$cnD2gU!bl&{*5m4Ab!oj4nDl{+q6y zb*;}DbkSD5U(eu<>5Nv|aR{ABL{oLT<2GJGed`QP{7^3ktZ%xwmnoorf2N_q5G$~` zp*7N=f!6Y`hSp^9IHQJ+tJfM=sOE!4*3o7)q&LChW>#|d_1`pDqOFt68fe$P^S=Nx zx{0;6Sq*J9VWHlw{|(KU80%98HMGI1#~K`9XK>L5n}+m?8jK#0_$t=AT5Q1Jz;$a) zc&DjVg^f+IAH^~x+H#H1dCb_%iuP<~?P}1a$7dHt4XtTP^9t3BZ*EO6sG&6tXkiej z%P6EZ%`?)O(4;dWa0_jwr;-&+g1vJ~YdO#7F03eFU0UW$SUHU8*q^LV!MZk9_Q>E^ zhE5Y!aN7zK7Mj);PxkF?t@xAQ%PX}1$A?b)OBwxf?XB##-*XHJRYw9j8D;;qy|p73 z{1&0ng%effdEum6d4o}f4yN0FJ(z}PP0@8h zvb%M*K@IIU9eWs@@Q`s5?Kf+Y&SZ%DI^8k1yw=0IRBX%OpdNoS0mq^^rj*t+#6wM4 z(NT%k$p&2%VqjfEomw$kDa3puoqIu6HjAGu~`N)4rByoz>-g;2zM1V;o$>oV{wt23Z`o@D&R;e&c@jS z;<3o;PsXmD$f{ljyZTtWGK$&VV!GXDhemy^sSJnP{hM_0Jq9M(>Eq+^iG+UE>Ws`g z3=8ew^ZHp2a^ilbm@d!W9AIcClh}m;Rv%70#)wgeCq9J};Im9A=?9T@V$6Mt6+JP? zy3U}BM%tmlhG1tP%mDl;-f;Zg0a3Px4zW(=#d~fh;rFKu`2W)Bb}`Kw%!^D3P;+O8 z8zlbFCA7y1jtsY2nb0x7kxom&5H!L%g0J!%IXxc_z0OFX%=Tu}LFfTGgJ$Hkv8;Kz zwYCr6__>P4viKQJ(FV`y8H`T8n~X6)uFKHrSl0HWGt0Yinm}|KYu(B7{VN2~ccT+T zAI1m)v2wT}AM|3#6hr56hVj(Ra)f>RIO`aWpH`vekq4cYCo`(GO}TtS^-CDKYdnRv zdDcC_%Lz%_ha~De52w1NrHm%$aM}TGR&0)rsgu8VdfSbYtux)r`KG}7D<|$`tkBML zxzL&_e$14T&I5~z47FsXwiH?G8Px2E3RWSq%wK|s`*0$An9+s#pd%C%TeF!MW^~Pf znfs;c24ysP75?e+7a4Xa(9FT3_KUl@`0Yfa5gjyHl*uy#HdjJO$?(7M-k}A@y6Vogu|{ zJ=B7j!NDo;&s1w9(=q2%uk5H5quSnfnspj4G9_Ay2t>q;`xuOliN?ekxV6)Y0YQh~ zs-@PxVn?Qw4!`q=Ww$L+R`gP-wVy#3%~H>m8KR*%qm>T7H;~RijMeFm8A6Ad)~Cf0 zda<_qbY&(k{Khb)eP&tNV>7J4C$kV3TR?&3%r*c^V{}nq2ax_mVAbZ}?$CTjjQZ~N zIo7t~b4)3nEVi9%Fvdy^nQQH7P(#r@Gz!dN6jF4<=T(fZ&nf&>+_&aghl<-6F^aDF zd~0*@J*JelY`KS)WkoN{$2AjMKzo`esV$6Zf-}++6Vt|Iip6fmG7+jxuV$57Pjli~ zMlBibZO_Mr`6e>cW8AV{DYM0>hVBYZ^4rc09Yp(D1BbO!+;a^HS@6K4M8lW9h#4oV`7iK|u$T zt+Nc{SXG8j$FbC9hH=cza)f=)GV3ay?^dDdW8XVX$1|#_>1Z%isuM%k$FZlu8WTL6 zkmxu@)p;JrbW2Y$nlOAEr$L@z4#zR}$O~IZI$11x!PCdG-8;b;lw0wK`qm>txgg@iHHJSS*7wHnBnfrdZe`{qGXLxbnt2RG(M=^5`{kg#y z;t){{f%MNSYvdpQyz)5DZWrm+0B2RVA0ATm7x%GZ8r=GL<)AOTKZhz~u%n-@JRv-r zyUXlQE%`u^M8 z7fgA4@QO=MP5xcwQqP6rPP*=8Q76&(Bp>u@<)1t^U1X<}@7bkP?cN(WjM(sSK!GaMd1OvAzS zrUi%bZ&se|`I{{cziQHO_>`-I;V|iK0}dZ?&lwJXe#@?}^%$)!=e%7xy~MNoCcA!k zE2q^zZj&zhSmxlD86w;xeO!F`-O4wlx%7yQiwu%@T+}O>%|X(~#XJAc&{DoSHZHze zrBU1TCQofP_MG=#{xdd07B2aqvYK;o@&hD$VXd6{pwe+~ zHYLvw=eUZQQ{rBE@XrRPM235V1-`%hXDp(f&D~|+{aWt*(D2>2xi?r#@A+9( z5odUQyGj1!-4Uxr|F2fwJS@%V@24a>@BT=H*t@%J>n z;h8ujSFA3t(q1b1mBNRZ-<)T9lNn6S=C_vWZ@qaS;ChDSk}oT(2=C|LAJHz~_$5Nr z2Dz`82cY~bEOnp3-6an|WCT&cM9@JrW=eUqWKe$ntIBH6TD7_XrHvO4NzaC_;ouMz zJXaZCk`M4m4~4gw-@4cI);(US5Z-2f>lytmYF|meydf+tmb?7Yf#DCimyr@+Rrqh* z4c4NOf);(Kw&+8pMenN>yr;BCo*$IX4oepsJJvJ$Vx`-(w>E~P--Hh^zqOy~t^LZR zlfyOUw~o}`(wlTfxPyC-b<&J-sXmQJ-~le=Au0&uRpxKi5J@ZArG_ucEeO8aZyK|mMF zXS`7A_k7oc)eBg22>*hE$WVKZyFsXNDnUn$zS08if7vlVOhMD7MPTd{aeJrcb{J~blEHGHjC zRC$TQndo(un=Xt>gW)9{WX4VN$4ZBWkLK=@fXGjemC87lT5^fPQuxnUs!j!7i~Gb< zw~v$Vluw1|1ws%zU_dL(=W zR|O;5_1p~;8|!smA^&j?=}7Cx_Yd#muO>(f3?J)AUTQ)zI0+2>5(k!$lv8^#BH7~i zlAN6^&9;0O{tE}l;p0>HG<~e&_kxa}&Jd0AD^sND;R&+NfQl=4FL1@)++C(ND+FqD zf~>Fx{xjAXFUwUUr0NpfCeGNrH&oq14iFm&s*NvwxQ}#K_&D=RD~&HDr%Juy0rN}! z#g}xQxmf;cKagP)_bv;4J8^&M@bKl_UDl}4f<|pp8-@R@Q5VRHFrBBi>ZNJY1>uJ{ zNUT*`OmEGZF6|ZGZhmW<=&kq6Z@sI(MGf=x4=K3_PdZR4qd^?`xM#_C?lp3p2bKvg zD&=Pnl-6^0BWLj_8zx_tGc%>5xHl~JUtktGee$I`D=_=-Z}R!Gr19LGdxHLkvQV@o zG+Qbu3D4ESnsmJ%k3 zUOG^JNnI{_XRdT#_;~#`ZIzlL|6-n074GEj5}p&If@=9s&zLV=VSvSu6@9ge1I>^X zy<|mp$|2Ir;j_4J8M6By3S%HfP~~;*6*gzOpjsO8hd<_SGZ)Bd^#+nyKD(!0I@n+SrT&>Z)rvMrGd!Wo|-BH|V4Q@5h zX7~hi1*_Bw&i!?%x?=v$N(2@7kCiI+(@hd#eY(uZ%~$!(Q;kXddQ!xYXmqLpcKb~Xo=DIC#D z-_~DJBdGs&l7!@{ckA~RHQw9~9)Ezl%P{?--GC|g)t22mq>vE~qtt1ghA-~mATs6h z8lH2*j1|8F;?h!NE#SUml)Q1ZlnuYZ-DMi_KmlF2Pxam{d?ELb@M1r8T>7Q_(RJRS z+SA<2it-Ube37emazf(wM|1cqVZhrGhW2xuqO4|nlTv?4x$p9#@NyF58F{({MzRu%ce2tCeGsl=wnW~{L{zEcMEe5iK4ag8+(l?nk zgr9LFJzbu(ebh|(JN?qHqbmu09Z@!oI&5^A+&^|irTp!F=}`IMHBxQ3$Jk2Cj#_-8 zs>_w2IhkB%WTew2zS3en%Id&=15%y0{7fCL+M`#km5wMW$gi%2PW-<9lyi)odpYMA zyYzC-F;;%EG@a|>BZ^3I)yOMOmd+N`8y=KqmE0vCJ}Auy-(#%zIL{p_BlW*ny0;Qs zv}(Py=OkTA3ae7Php!x6%ZEX5^ilY{_7rK)lJIZ2`q!K)@o}u&7pj4|?`hIGSZLxt zVGB()0vUggdx_~j(M#{@FDaJcmMwW{pOUcWilXqztcF@a4PWUWQ!HimB+*OO8s6O? zHJ60PnQD-~|80X-h;YxDR{RLhIjv}cD7DRQ?lWdctUt>Db~^W*f&HEFIR{p; zk3>}C`bFXIQ_hxd@PwCeuo=09}=f&p*KXJgKjhCJcbrq3?9 z!1gN^(znM3dHw|wA4|kfewbVRA^E`zB!|pNGi5q{Dts1KGue8x79vBmG5x^oL#!k6 zncQd0{tI6uZS#aTakm-i=D2)IsQMaEW%1}KKIN_{+0#t zK0NVe?hRVlm7lmodL#TGclU1DAk+K@jCy_Y+)E9kp9K{<`YODg14*bHve81l>XWS0 z+ope}jN96wSI|{{`IgJ1xf8$ahUBss6XVOO-j~0)Oqx`3zC7V_+dgLP_(ZUyPpZBfZI;ZT0z8(qvDi z`T{%K3ZJsGt-M#_#n?+eDL)1!yR)tSMkM%zkogg%B)^M(a+FPV6!NPicc-Lq_$NyFl_l74-8e{rsEmolNiTK}erY+Vd3p`61mS zUQRwkrM^5BKcwRMDCq}dQPVfFX5)A)GVB^y~QHnJ9MWY}+Hm~RBuAi+5d<&6y8jSSUu7@8XyiW?br+~HZQKNzR8{$R(D zu>N2fm05qVWVNh6Sgy84>ZpV1hxH6w++#h1{8o9jYuHu^)-@~%4|PqFWlLc_!}f`= zo?$6LSkJJKY}PX@;AVGm#(YNiPLsHd3@p`K1M ziGAmaQ7FESzApF@owJJP^mwLSDVHQF_Tdl8DjwA1nQMJmJgLX?E$hSLQ9Yi;)`!J2 zj6AK@hs8sTJjYufvQv!i4vv{xVtrOT#mKY9@=!na%5#eKVeuRz&so-oI)To&KGX?x znf0MgpzEyqDJDFIXPx=U;hVw?5Pf^hfJM zoj`xFKGX^Hh4mpL(1jz=%&W|-r=Nl4sjxoO8ECxqp-!N^tq*kq9cX>16X+1jL;WBu zPmT4VPN1dMhdP0lTOaBKidi4(9MofdsB=)x`cS7(-uh6d&_?S+okE+e4|NJ%WqCNa zc!ZkgChJ3;LU&sq>J<91^`TCo$CZcLv0vAo4n@XJprUhDlI$EQN{pXm$3ZnwiJk4# zPbGFn6Q#mTvNM3J%#^N}cnT9O@hR7pc=n0~71haEiBNu^`yXDJ0qCQqZE zd+Fy^`uR2eJWW4;qn{7xCrV0}ov%a2Ym%Lf(?unA;0>8%$ul;{>9rNdmMmr`g<+w1 zHl;0ONY&Sa}sI_{Qw$2aL|ldvu7MMluH_5-Z~-q31Bzuo69BXXj3|{ zl0mJV=*4HaEbQ8m?u_>(GMRMd#6A><=$&zDMflGuTC^4wO?0Seb5NAgHMnyB# zitwK*+B-l8Q_ORyXu(KO>A)?L1X@17McPxwd1JeIYJNb5<1G2!Ez)?o{sA0fuG!$rdV6pCwPE645!?Mo!_XLJU@@RTDxq&|W6a6rA{9<|g!_vX!n+PJialsGhT&aUg zhUBAvEzOg|4@-NGx!AKy$u9JF+K}?n76y8S+&VCJQkmMwsSPD^$%sg0$-N9@SOgDBf&yO89}K)cHv{|39@wkfN_P(_ZJly8k#U7w_oy_sq**@cQE7M2q^rrR zH=iTh9#<=8#F^T6?DM6gR5*1ioTna>$|n|4Yw*V{eNtvz*4}Amu^h4iF>~iBTF%4~0tDU?}ihu|QZv#yio53&QzypX59`aiR zkAWffob)Yu%DYlgpqc7A>l(S@AJTGp!}D0jJClLpcsYNK*27Q%v*eoXQqptNHKYS) zDrcenVY@WG1}8BVJJ$0& zt8=DSk))cwNa)T$rXL6pe{oIG}nJovga3TI)NfIT7t2C|jPKbsdmKn3=m0u00} zllQGb*vkHIN);vVnqj>qf;ILnv>_}%`e$jSeDzz>Y)=S#GoX1rn|>gV|ASP;yaIUHjX%H`2PSw`QXFM!bOw`8EUeOrZrr^X$(9zeC1kLwOO{xGFoD!ZMRK}CF zMX;mg`2R`!LB!AhKWN$k3zU8ag(Md&b+EknJOl#e-;)9e-qA1{Ex-OAQ@NkN2UR&_ z0de7VY*;G!zI^Q;nJ5C%aV0T5+;qmg!4}k0YTwND@AVtyKn?FEv-?P;5u8OM>(h1uu#;KIsRlfFPr9(!`yL|{F zty!M=AzEK9&-%n5&86%H>&SBfy!- zKI6J^`t_V5oG0f$mL?KHPlO+! z!|aAKK%3xvdF!rw$TR;1?w!kkK?RJKFZdT&r&-?oF94fu26mtknBF63v|W!(`y9d_ z*DJW=b7ppJ`y9k;H&<|?UV+ZiOw0Z{W-}({&&t~6qz47fE z47|PTSLn*k#qZ@azLG|uwcq~=t-Z;N%ndp+o@;L~Fm%&Vl@%&B+ltUG{#S&6g-Rk5 zN}e$Td(s}*2g@onV4oI&jV+tfpGo)iXIH1wU|J~TfrmH&|${s%_*de?vg zg087JJknIMr@SEmCSv{m;Ng*x6ZC3It+wApdHYBMYwb=qsy%(Yvc!;3tj*J;C_gs# z$X7jw@tutm^W}-BR!m?f2ad31DD=ZZ+g+Kztrv|wIE>g>_c8LD;mE-rZQ$7)jO>LY zh&5Oh^1h+SZbg0XQDkhL8Az5jpk$3)H8xV5BXF7+^J=0GkgJf#eaOtzr%NNdW0W;v zb14UCf<7FsJELNz$|a9+pjyL`>EM$MVSswX4C-NO*rD18{f`Y5d#MoK<{-RK9+?aX z|0oBiyS#yX`-b7AXyC92uDe7=8HXDY=oE8=PJLrP88*8%Xo# z?^MiGK~3dBeYR`l0D#(O6hQ5*fZ{}O&l{D&*$C^y6{F{=uo`(-4^%{E1J?T$fVJ2H zYmpGvd!r-MJpDYB-Z6|o-yZ`geHJK5A(Zb`M#g)Fcqm_0Mn+H3WWv+nZ731VV=uJ8 z|Gom>^KT3?SC!3Yhbnu1|3h6eaICV*a%_OXS=bdSEq>0|ABaYFEe;i^U%u-DsnYYP zrM{o4^}WK^_b^{yk;P#iG<*KaS5h`EQUQYRJ}$Dm=MxL)4^>hX0Wf9!&tH$;Lq&M( zO&m{MxLbrKw72aB&X(jglMU)(G`pb;utjiaf0Jhc595JDBfD~v`#dK2h}<*5nkR$> zo#r`-hjqaokzIk$HG2S`CRk7Hjz#hHNX5-IUGgz zncHxW-iDGp8&{m8rn=+8@%~5mNK>pm`z;CiVYySxEu$yqRMmdJm zHTwX-LIs@>1KCV08}CZQ+4qz7g;uO)-*XD>TkM8n2Q9$i@|;&__oKo z@_*eu6+l*-k?1HQ!Ng$1!(+y(G(Puc`OU*XnJ@N>jK=4q_J?xCxh`k}hr@R`95&o+ zFepY$iyY>;L&f2`X^~k)x%a~0e!di)f2$eX&Gz7i_AK_qkCXgTQJ!X_T?qQ+} zkn&GvM#@3hr)Necc&6N9_DfE>Me$3TzcX+Gn}sq@j)hb;ge6a!d0easXU zqqh|>Hy#Yl(kwr5Fu=TF0rPtuj83uaEoO?5iN*B&n9Za{Rs-OZhk`((Hk-S1#Ad~0 zFsih_&n$}D{LLh8y@1fP01)O_Aj~pDU3?5@MLvM=27ZW7Hx;e^9%i+Ahb%RDkhO^MT7xxoc|6Tii9ZW%Ml!zOSK+h&fNY^5J5Y zcHzgxxyl8heydt!l|QFlQ00HoE(q3?%?vWNuudsbulC3l)scPjXJ3bhKGxWq)FK1- zjCMib{;gaP875xO-X}8DC`BSer&?s6o}*pRrw=F>gzl?Kk?{CJEjAoaJT>_S1~bq= zA1u@^2=$zDLHJhkOJq!w9iuW1+bv4^+`6II5Lvwyf{OiiS!q-2hKOl{hR{!kxi5Z?A4=|bt30hP zvj2f3#keFi@Q+p@@sE}u@sHLYj~~kA3ZA60_BicU`8#!yNzT*AKWey;&*n%>Mnmf;Stq58-a5B^WDhyFf6 zUQ!=fJu-%O(YoYVz)5zq@-Y|B=;qrM)A|d6D_YnES#u0V&_ALw9 z+aKU<0G!qksl~tkhRC<@@2-Z(T(;7lyyQ5<4}RA!CP!m_(=K0rIAJ9 zby@93R=#U#q@kR}3bMuGkIs?be0$73R3i%lWH&1=l@~Nc4lHMZf6AKjqr zrbuP}1dudIKiw0++g)_ktz8khlM2o8EJIiH!`^&oB(B&}_8Dj{9=|XdH(1kug%WNY zypMKc4obM8P@jl%E&2gB!t*HTj(roF$L<6U(Z2DbgjbY4pl@zOZ{SAgDD4eDO1ROq zOuNy55^lU+yz#1X<1+M+QVaDJ-r#yEk8Z*pzJu5eu5-{Uc!Td8c7y8}bP?X*ItE>Y z8+^yG8+@m*8>e4h3e=QNkry>b)=pkXO??DJE(EI+?TZq<@u5vabyY9Fklo^oJqa0; z`&<1}xA>2aEb&zo#>6rm{h7r2cqTrO>&*_xi;s!iSn{d-@iCF=^3MQ8RS%!bM=Xo{ zZolwtfa}jDvVfW#K^MNkcr1gVnaY7UK+%dy#clG2TOwl`#?akz-0X{G25Wj_1DIq; zbkCd1^g=PT7lQrq3`}d>=PN_KP!EAvF58{X48eJq{8YYXdE}J*RD7&rG!4b$hYFsR ziJ@3>2D?bkRQ}CY9(mhhgLpQVU19OZ`mw(yG0<)MeQ9;NuP-(bOVJw)N`6QTrY7B+ zNZFwgXo;;}9q-j(h(QFZW0_1mYX`%(D&5~b6w5m`rnEKQ9vjFeVz!uss?(XScr4fH zjEk>UX|k25c@R9bGTjx+(lo%4w85pZSSA~{#l>Hp9&m%htEM4H%1+@xiR)L;y5eaGJMH^(y_JY@}J@i&5rx&#pNG(@X{ z=UWnAz1j{7Uu~SRp#w#N%M-o5iCCW<2>+sZXO3@1d?1;M0?*_){Os2sf-WIo;iEYP;wXgNHKN1^qsFSG0-uT^mFAP$_6zgZ#XV; zU7oLVYVv9NK+}NFvab~lxQ)u?-RX7xx%OVDm3Ze+@_F_=mCnXV*(E3A(olg}&skMe za>h~k2)2}w;X+5LjpI5gHjv~|O};{vYdRFg{!PzQuhS?$_@@K5i9>vsE>H}^!!SDb?H@VaFVC` zT&ipOpsjxKHpb%|w+N`XBD6A>n%)F3b~5R!>1HTuXh0-+c1ZeeM^{={?n(K3vB=~d zW4SkCky}j?SVxs;EVr00RF-R?vPlA$vgbP|fh}NAiv(h1$!1Tc0b3)6S^kz&)fQqr z#k7^c5jyCoahm?N>g>*VSzAKA?`zd{9qc}# z;qR^4iONGu60zy3LKR}*sngK6t3R+ZcXG}i1#japMxJ!4Yf!QFgh((F<#jD57&rW_ z-8oi~5P)YXKT|M|7*8>FQzv>wBjs5|B}BV(U0sP(*J}Beqa&kskokA-iCh>|M=AOK zbfhM~17=tuSg)XB%Posy9sKZWR(;d&D2jGPW~kWEUn;Ww(9x0ocCdNx_C`J|HZQRz zvUj*R3$MRJ5#A+Kw@G*#>E7^#SJ3NaOJgS|`VuhVZRdD_rbI_i{2+enYo^Nzoh}?% zw#m<>BGDbQ7D4+}n zmryyl7ET(rqJAk|4xepAwtBdxck3;Ec|_h~a%M-O(4urMlSvq?2>}lVnqpZ8{qL`j zwI|w9hi0PbAo9g|Y@n?>4KI%!CcfqAUYh1wK@=hQ8{_K|wg5tHxzs>5*2_yld?0T$ zoUbFgT6+9>rN>9%<5({lMgx6Vf$hVlHV4zlw%-8^(|5I(b6zi;xZ*i3&ER*LU}%*E+-S zE$mGgI4;lZdevJPtd8}ha~<%qIrVk0F4hjd)ZH%yIJz>~t+x|yTD#bMp-@`gyY<@j zYjW}Rb|AI(e{KHJ`MAi0p{lOjfQpAu>x7yUzy)qz9hL&1TB0LiyaT$Y*(XEPv{uj*{x0?$z#%)pXufixL za;S1d2(*ESM25WBYH-IO05T+VwjD7XB;b`_hoQOH7baHltaKwngZ)H+mKB_7pfns( zX;h?vbhaA?2&XK};@LuSU>QDtrHH5hYs%&5q}62F>y$``NiKiJ%jIK6x%~NmKrTI1 zrf$F_u&A!wWeUSt8VhUouDs!ZFlYv<-`Fwkn!!_T(cW4cz>bQ*IVcul$09dc>vv^G ze~l`4!l7&!>qqd+0?B@csEu55ZVK^8sK79&SQ8yxK`{N4*A$e<9qFzeyo((4Q#}R% zXW1--@^yF#M!P*rVX@5xNT5EJ&e)F?-sKEOF-I_Kt>7b6?ex=K)?vNZx7-$l24RN=On&i=d`GEnhjw_{Tr_o%Agps>3CoZrSbS$dPPSkBF#3K3 zcA0mH!^bA?9gWMzwSQEf>cU*wjva;};4mFh>kX3veoo-6X2Y4nk9naB5IvDv?~;wGJ{^$9(j2Z)5sj$o>G3O4=IV zn*6YS6@DLR?M@_(t^r;PsPuCTzH#<}8}x_}1%GQI_B)o!CnVE+|7buP*@L`y=Q4g(`1Q{0Rc1_MhPrg4S0 zzuhB`N>@Ootat2?7FWRl41Hp3nHLo`gOhBnNc7v#6XbNXx7grRx&n0QVB;J3Pb}cB z?a*f6;?EZ`LOZlU{oKa>_O4X*1K}%_t>DT;hf_xkhmM$Pkp36900y74hQm5tYPG$J z_tvSGD4qWMS0c9<9r5d#R*>9${S^%^1wfxn&k+D0rR@Csl$Q@Jp2$reI5ANpNZ6DHwx>oeDJh>1VlmC?6+NtHComop9 zPl0cr6Al)4$8Y3qg~)C@Mu^?di`;1PZ&cjF5n>EosDzNHY+5vnvgbQTmHbXFu}&r& zEL3?xT|+6Znw>Zh&tN^vx`wjXutEag`F`XNCQ8ktinVoa(wm92WCgFd) z^_QP=6#QE)h?b3KPUD6#^P=m;sazjR;7BP&#Osq2T*7V79Ev<; zqR|aht45<684VN~-ArW@jW)CAJ4d5qMAN^Kr>OiiiNZXbsB|x(V4>2>^2AGaR5kZq z64_)T%9B)yMwF*nqZOh&OJx&Lwz20sM-(CBl%9g<`PihY#ZL5iiE6jd;}wN#O1Ug8 zVvbQ6CaMH$TmACVrIG7R#QA_~)QIyTYrI07kEv`T&Zq48&Jjl#>=!>RE1v0! zr3|a|W+gL;YPZniRr&ZUBI9<99-FU-TwtQd9I8a)ka?`V3Wpp_WfMIPWzTnx9uW~e z#-1W?=pGw6)QLT6sd5WBey>PRRK9!dm}tFK*^Wr0AXpvocqgWyP{wT8*?iL?OIuFA zDzd-FM5+~pfJUlSj1&r~j$`G-Ks(ZGoIvGLZQBDZtCQ$PFxE+N1gO8AE?Lk}vXd1% z(}|(vYW7uoI^CmgC+nsg`BHVYJcZ0nNxBolxY8f%qWl!S>>8Uq)}=F@xK6R_-gF1s zK#Kj(bRB?wN7*cPENX?E48f%|Uo^yctVqOF7p-Xp%yn6~bWtXbsAvSj*-jsM7-A=L zX;p9hWW++ScqnBKTJS8H0f#z@* z^4BI#W&s)wb9xJn)t2Oo%Yzz@4nP7m*fhua?SwQzj4tneHP0%)&3L95fVy1WCi9vi zW976~M6n$Ry8^?R2Pr+4!yMWHi8n@ee0E*r0h4fTr>ZsK`ZZnXksUA4HT^=MsVg_g99JelOXxwIj+F4l!GGM?wO z1rL4&&))>Unl6Kdl10|OQG>%B$N@1*O?TQh)WEx@4s17b|7%9>SDF8Hh55(IGY=X) z7772&Y(K1t@+~)7rh#+s_me&og?PR#GLJ3OJUvSQ%fBXSHkFFiw=uxCQ&uVsz6K$lXEr zb}k~mUsX1?gaL{LX$fz z8U|WpeL2p`GL1~wV3+R2eqP7nAh0x+QeG1Fe6S^-ZjWcOE`9H-%JkSS-g=weVw2UV zW+4W7Y-2{$t#d2NS`(d}-7tK+ZL#nzfx2>r;%|wqO=vsSwOB1Z#5LU7y$)5ZpS~*L z7AM^lx9^r=AQEWkMtqEdi4tOMG!;r)V~N!{#C*HNT9*=89I1jqG|{Q42cdSrkRQbE z+EgD)1tIJre|@|Q$BsDi87)B(`NaSl<25YoQZP^#gND3qY1^&trFH4v&Vg*)t%n0_ z%RHef#G|Y!o)}n%v5n(5@qEIM84jbq5Kf5DY46SPrkGjZ{gmk%v%Y&vF&y?GBxcW? zbx2pQT>javd+ea@JMZquJEm=mN8iWkzE-+Wb>A^mF7A9o=d}AOxHMm!#|k)8i$#Ni>P^_U zrXE=uTx8>@K5kyhYjK)P`WoYF-ITa#ASifRtW(GHNXdoAFQt7_MM9#adxs&z0)3!bH&Lvd5< z5QdhH^E)0d_NnJ`StJc|1l)4%v7Q*2r@|)ss@j%V*g=Jdfi}$ z?M6kmegW$A40OkAbCd&4Im+tTR3fiU|6jnZR8JB4fH4;CkH&7VdM*e+JSU#z3V8{*Pcla z*iQQVtYOz~P(|8~^~zhbx$fS+ ztq+5}GcMuW3P$VfKw+iB1sN9rp2oZj75ZcZUYl$-mg#ii3m}3-3W>Ds_#13WXQ0|#zLCrX!`x;;9}SDxoTR zo49P6$5myJ^U{v2BkQ`;IBp=(8C#o9yEO|g+|~A^X^SB<>VUzal@x4e1CX3;$^zsH zO~u!{Eh7*JT|Jrz6K+UkU~p$*NvKfg?Hv9p%o)rkTX7HzFNOOw-GdRz5KSXAN0C%i zNMmI?C48~%Q0lH`Tc8TrCG6Kng7t_fP+}Y$21H*KmVdSGwr*!wxEe>1;IRV)f7M`5 z&ej6;FU%lyoHGPp)u5tIgfVTQ2J}u(?D4Cefq1HE)Kdhx;A=r>2riBC9Bgq}c~Iz& zok83pJ*IEU?i3>|Z)4hJbRJ&%U;}0IChISZ48+;T&rJ6v;>lv;OpPk5PJ?yR>jrvU z)B|>(W31y01rwm))=Tp12Xh&h?haJPyHobl%d+}-C+y;E%FU-Ff^rE)R0sL(ba3m?1hp7U4t#(He+(?ES1%7qq#Oe9epnrbp(Oz;&U5)QCIWIV=@ z@8O)|%29 zv0zoED~AYNXLJHJTW=ZQJUj*X`=P2S&4^LSgj!;KTW`i09Z}KuwyUT!|}}$ zj7+|ljitg42QFFUbLw|VyahP%l+#Xp4B55`2^9Tb2w-&(b`yvw#KaG+=;qQumWhPE zP0#F{waXu@XCTA(m=)V$FtP^ezY=u(RCO+6=S(Xu<~V{1ERSJXo@#wtL*zinT_mCyY(`CfE=u~!P7h1P}yS`yewscezD#Y~vS1rdNkh@7qQtP+XB)R%BI)&_)XAlJu{gKV&)M->|Zx3Z10 zHbB_3KtrNe@tT;%K;`rrMX_1{@CJ&n1v!vx#YL%vPHnJ{+Xx~KIqr%CdZY-1wcjMU zA%u32q2yv*@L&iz6O`es3D+W8m2XZVv=m;juH4q!K?b8bLHI;!{nl%H-G(k?(IJq? z#Ad@UL7a*(uzrv%D|D3Y>b)Ou498Ni4%E!AQcQ*5D(iH^hM=q=Itp1A0cT1cE(eYQ zmHWHx&5Y3E{(hWGB6NOn3sm4c@jR0pG=aKAZ+qHBjL{Nn;8D@G_SdQ@njKmqgrGYp zgS9j%%Lt4}UVwx^ifACq3eG;kK{FOZl!sATi!)S`gDktD&3T!Jvg9A)ESFVrj$P5@3o6 z5R~N)Aq4y6Fr0zp2*FDyJ2>`guogk%>zw8&x&9u_E-ET`^8^`Cy5I-aO#+h|LQvI! zn=KHiP4o`h$Q#7XUybzuty3X?!r=;d^d+)f$S!QybiB8ldeW#E{Y#N9*1m&T?_gGB zb7lm|znXf%Ja^+oCP-Porq@$7!VY#ncfwx_myC4~&#SpPL#-X%nHW|JoIAO6A^99J z@3QTnAo7%F({9WDzJ)!?{9c_PKz|{#s+G1ix%QkoQWROnMIfPi1Q}&qhxJlqzFi+{$HJ>!FOdTYXXWrD7C1=V;Se`q!tb&G z>tBc|jl(b;TZ?tXybE+13Wr z7A{t!-w4I*pzzhYE-RWKm@o(|iD%ZwT)0XHA#@OX8{Bp%1s3K~{4#nWun|JGZIZ4(ZxAqFO`SxjvHW$H3)1Ht^k#o!4n!lp#~*a6Jt+~g@x7t3Yx z_MvA1m>L)potxvYi>Ekir9i%E_KmSk2r5s-LF&)n+#dtLw!ff3^s5(cyMhLy)GkNK!hch>5N^6>eMIG7RPG z3;?dbfqb`O-#MTGs|skO?3G+ueJlwHOk~!(a05=D$e31NdAE*#X}nt`dc1uNnhYvy zN(RZ(+zm*PjIcWwO#mNOalbdr8l9im=ncC#4ueaZci2rk>JurLI<}odX8P8Lvn-LL z18oEOdN@1#Y=Z{^D`P!5H`Rt+iK#9u!`Y@obI3-fPb-m1m~F98=vqLcN*WlA+7Sl> zxG8JzV!`A^RcD?ii{)gB`|~dzh{bFTNPknhof)_e6vd9Ln7y-M35HL@h$US20u1{E z1!Z;ES2O@>yG_A@1O+P<0>z~;S_1}tN@M53t^kv!AZ8)M2M~0x90lS)6JQeWP75(n z+#17}m14&owrprv6w1hI!N{dJ_KaNOUz8YNo#=p)e+iZ+F~QQ@N(Ci|zyt^z;DRJu2v5N@`av;w~ky}h!B+wA+S6m!UX>0=o z!)7u{eb$0lMwi(rKp^1W*7_LPn(me(A4Env4Z>)OgADtdyWOXQju0s6=n2kU>x=?U zpoTVbib0V}{)|+OLiG`9O0P}es314vd5IHO;2`Y*(AilZ(guqq2=z#;wh!d;!=`i& zlYa6$|P0vMbd z6Iz0U+6_)56OWZ4q6|IX>*@zCLbBc9foL%J3tT4g{v})Q&)BjVTm|WFWEQYkcl+sG z#QcX#aFDHLLYP|QZ%MBoMhDr8-KgS9o%AW-$7qt(#m41afp2iEA~di2kK*2-;1x$Bd!B!+|E+ z*U*x4v&H>O(u3L-8iD@^i(L$52lDzGpq*@kkP)An!ck%wmpMFuWa2E_lW+_SHX>|g zpujIT&b8~}R4|1)Y3rwO8Pdq<&@!BU zvDQAO6l?H(jut&L=&c}lM?>H!Oe+G+QypX6$C{Ur6_+W~XiQok%8z0kO-O*|4 zAg^}d22G@d^5c-LiF3Tz%b%|iCYW6l{IxkOvzusX8p)u*aM%Sq7EnkA@6iw^XZy?F znCP^pV?|9mmCe9R7?^$(`bAl*aA4MvIO#MoFul1O=OJVI&DzG=Z=!8*DK^(AdEXqM z`6#`j9eqj>2StXE+rp+l@0g@<7%Q9j-gee84965UI5y@w*ClL?s31Is*xuu!XaU60 z;I}@uVC1z{AuTYu!jdVpphNa90Ai1&EM$=SxsJ}iPU+Xd$BBfV%T-+i;2J4TRbeaXuIUq!N!Sc$iR;VBdN^XtY zxuOImYoF&7bXy-!*lxbp_>boJ9k$w*?UYfD#B*qm3k=i%gVUI|K0nh>Jx? z092hRH+>lyXWGITIy#pCy_i$Zr7fa(1I4cxYD@R_B(UuVT;m2LZ%fs1U4rFCvQouN z6!fuE-nHV{-XdJI9%eA?4Y&}6tRrsLMbUws@LF8o^NaJAW?0k$9g4g*k!7$PW>_Hx zEMd>QSeZgtfisaqOSfK|9o%}Y-4r%ophce?W!Q5D6j(~RsP!SO5I8E+nPWGMN)MaR zutNdi`K}H+fb7kuZT1#JPXwwEV~}(4+7b{4%Nv&6!d~o2BpctU|1HUk#Q`uY+4_@eW`vf?=fKF`>%g9ci2>$@yD7R@s(z<`g?4h1+>doiyU5mKk4XQ^#h;BY%6 z(BubMQ3@f%++stSEe_SO)PS3_-M2JtKbVwN!FipEQIdO?8IBio4Ldx1b()vNc9Msw z%KGk3`xuBo9d;;STc$HT150x~xs<*zR`4&2Azl?985Ms?n+u#K3S1)|~jVOvsg*HMkxZ!U|d;Q7Dos+x~X|zQ=E%fQiEz3V;N}@f{zU=RrV~a zMhJ$QphXbqYBx+H18Q++tOI@A!I-zSiz5VmK*!eOhpfKPb^={u2zdbn#c9TMw%*sU zZ8F$F8(;4dA@-lhMo2JhDN7#C+IR<)3X$G{w2n}Q3P|Tb>x5n1>D4YfUI7U4ZkmcC z4>y8$668L$NE8nBMPe`)bywPixiE5`aZAQuBc+dVDc#qUwh!IH-ow-Y<_4r09SG$J zBd`czmHdoYX#e<{EGKvBtb?^@H{Mg+14c!o1K|8yyX*q`3;kohqx$S!j=ricMr|i5 zIDKEXu(v(d)!iBEbc>J#8ccTxg~;cgm(_pNI%U`w0`M)s5YRtH)rn0T2+woI4cW*L8EWs>^)1S|gns1< zzUCO0F{a>YCfnMC3i&FX)(IiRMlfao%SaoF+A>8S-2BbhO(G(P{QijOI44&yHa%kZ zI`g;+`WBdFf0%%(2U;A z7l#b1q%`9|79n@>%$3gBXT=ORQ(C0%o__n(7@DKR-;CKFiKVcYy->Z*GzlIJUR=+0 zkV#b3L*OkP!%dWYSQ@bPNP8Pu9tX?!Iw^obijq6qwL!d{(Kt{dkjw6jYZ}loMMz@1 zn=>V_;h*N9MKlV&g;{1)8d){il0)QsKU0(rtpc>TLzEJBthQrIgBzD1jUa~+-!){Y zAgCHC0bD0wAdV6NDP$3U3lhq@Xd@&K>FV>itv6>}iQGMy?Y3R1 z@*%x|ix3BuWxc#U0piI5stESoS)&uYEeWyb!WK{ zLZB6{c{cNOin&B{l!zB046Jgkb=@%L2qD-Ce`ar{4LmAZ669Wj1AuY)>wC<-Ht1UlkdrwrxRa_5Rkci1}=>BKzfhv z7`EcNdF)#$d7`a<%}-fstsso!FV-_{b$SRJ^m~zT(#1#a2%@#SH?9W#sv<9(s5-$2 zE>8>$BNWecaE6(rbs6LbBeU8$6vBKElQ0Fl^^5^gwn-C0nlC|hl&LaYhD9TUBg17? zTq_sPqbcuhSGUFOr_aW5G@%RoQ{wkzJRi5s;vZ;O7t6ZsH-ZRcxY?D^bBJx7uH!Gk zv5$r;t?Q;V-AZ7w(_qnxq=&X@+K;rU=8$seKVvAUuN7$piUMik8$V{?ZU=9bwjxoA zW0J5?HB2?tqvbUJ90sJicsJ!ZaOr9XK-xkPTdqWOnu}K~*c$IoIJmySRGslIH~#{X zdBR7-h%6d^u&=ML(~W)UxJsv#hO4^WG!-J;XwRc!ObLyg;)xh;dm@00?YXfqWxeo3 zqO~H%)$C7lR`w{-c0yE!Gnjy^3HwblNX#BnsLtbIUMK=l5+|!}83hsK8Uu~Ucrp%4 zuB)Z^jt>GgeO}g#XeR0=x62Jvugcdc0B4PJ#uiFM-q(gvM*)0_NTjHH8gn?&PldAT&Vk0saI#CKA z%0LHMDHdi$918X}3xv?HP#bDZr>Bzx3f~K*(A~5Rw)P{6(cZZnXrnMkeReHgC@RHK zJWvJ$TQD&_UP(T)J_MnV+wxQVWxqYcU{YuP< z5SKtR=B4g7dp?V>Kb_wSp0*hl<}GW6LdLNx*xTpa-yly8&i`U(3#St_(!}9lP(YFX zvFY+$pTi;v97U9d-F2uBwXTh4F>BSu)X2#4F?&ZxIT=C0RKzDheAt;kLOsK%1%U)vJ%T1iQ#=TLMl9hqboluInM8wgluk!RA;( z77sOH@Eb$`z6~mx5$bD0Clmjp3N$TKY;~u;S%xG~oU6*R2sn&ej4`$zm*>zz8Qt%yvGa%43 zTn&Q23asKftU1%EG0qqujJPw8)ODN_(Mib$at=t3;K_34~jwC2hjX3n;O6$HV#dYoXB%Gymn;GWE5 zZ;3O6;3|0QV)lEW;Wc+V4mYr>JC`1eqaF(#m=kOm+9I$jkyO?IEFj41G>i;_euALm zQ1C3F>9)xUHas{P(Q>Xb0t^FyZxxqZ+A{wz>OJ#u?$g?$Kp6V~`1Vm$n|v1s zpy%2PBWKT?Jqw#YvfbE8z9u%5G#~Ho>w}YQC`l*47u0e^kN|uJ0ID7Sx%}HX(I-rY zSl2&+FSJ9fm(YcJh;<{Ci-%Y@J;4sKE@_rG92^~!Z=r`fcWyQBudy-@^}M~irg)Yz z5-06!>NMA=Ph-$0>$W{X*m0olel0`E6BwP${q_fW{QT(b9Xo1#!~E!drd~RWD$#oB zyL6%U(neN$ydQbMvqQ=6sl9Zb(o5&l!=39T*HA}Y-&ENMc&q=54H^?`dy<=D4{)nOpbXBZcPvNYI@sl0Nlx4PnxE3a4fd-KrA=rn)?QuR9V>H4^kTLO^a-d@^9dF(aQG@E zhZYSZgr;9hYT;mXTF~U~ zH^tVvgoz;56ierSE5~t)WkgY-jR-}xi32`wIVtjRa0^7d${;6XcxA3$vav9gHeE!xP&QLAl#4|m3UORNkWfES>R={xJjM;W6ZnPc#DwuH1 zNKXp6xWn~CFYLI66=aLbaKee1hc}~ZHH-Wa^XDi77#Y)gKUurU#W!2ju=aW_xFQ$N zW7mf5fakhA3TKl;fP#0an}fr@G7rOarwHm>nd3Guo4E`NUvcH=3yqYfY5g9_E7%Op zJU{$DHm{dI-gVThxx<;vTj%muEs8#BnkDu=$;}c|bfL}?`>9-<%V(ZsbNPIh?v`Wf z2uJ;34dO`#hLUUP-p)-KEeAWog4b3Cn8H&}m&`k{ALXV}QuGUz1P%_%xb4NHuo6BB z!_;>H_{XsE(q(1MSLJZBszcUqZ8Tg8Bf@4GiH2sElu9ZTUu|ritq={vb1e3fEYUip z41}UdPsWg%%*4paGZ_z=`S?ABk7L(|rtTmgH!O)hV&dapQdOF_<2kxe`FK0SLGyO} zTIJ&x=&p&6e@pjvj*lrplx=EfF6=xb2LKM)BUHbap7hrNjE~N* zv1?1HI+N}Y%_@x+m9MHdeliC`sb7O%l|7vD zn_#<@B||ydZ)8wR7Z3$+pXC{(THEW^u|aS-(#6?Y{Lo3rAy33`n)4%@Wo&>VVZ=S# z0OyJpL}{mgLfp9M9L9V@aoMH39-3VX-jZqL4nJ8U&$YOrnO{0)bD>Eu~+mxfiG0 zS(EXZ1J44BaK5p9UiuJbM5*k0w~YOz3x|8AsPpbKf0eqmTu7U8Xy!r0pe2xohcYfX<5Ps7jz2~ekiY~xSe+rjbJOFL*U8@& zcRkP2yCjjf^`86M>^&u#t-aXcI5`~3(4IpCsk7cD!yXCT5GG*`$t%u-unoQ!N2)oK zBZR0s%pOwqK_fV_$*TT0H1Y8o7A`Td2KF4fgh#QJ0P*267;KeeQXIq>OFNxL4}8-M zD}&a+rgNeObc)jfN=RF(r1Nt?1V?w+rv(XAp}SZlnizsQrkg?>O|*pj)b&Jc-$G=Q zL`#LzkD&#TbE2FC#xb2l&Grm69QxO}8UfQzL=3&TM$-_zz4RekAB7KrTt!)c>++0u?0>=?)SG;Tp} zi*azf-V{?XO=)Wm2~pu*aTHpN;mgy>jzru(NhLy%kPMt1lBBiB?09&art+f$X3xVP z<)&k!({?Pv;`U>sFPeM~Up6yI5p?wWiKrF3uSd=5dJ z6(-zFp$6y$YM`Ysn|C%b*l_`{u=MUTjm5(_?+&pA_Fe;jo)9d3J zGPBrfEQCh1D+iT3&F+y(tat0R>uo30rFF2n``TT1Tsj85VOz&AG8_sP7Of|CVV)pd zUNnowqpTlP(b%;Z8~VYm6dNi}IC#_nJLr2)#-op$nBh~ZOJjz=(uK+le`iQ&%xP~;`!PL#Q`y!oHM^7rcJ`8XJ^$Wwo!?Vihc&D`w&6Q{$uA5Zn)0 zLIP9kaOcDB6&KS+-J>8*v^AXZe@Dk%^RUZe?00V54Lu8KwG1(TzLmwDm*XfGED*1c zBmR7jAnJUIHSFBFQazN6o;8&p)}|)EN0n+?<~+Jkwaoc)RS)9JFQi+;S0RGo1SeMF z=s;!65Hed3dtFR^(caXc!a%ywirJYV;^X zs-%{hNl1IKK0b+nPHIsqkl(y9GEx3RGP+yIWVxg#D$S>-)79#Sat703YVs~ZUZe3n zbfMDtUWLZ@)2;tKjejPj@kflt<|ciKyU8c$=ds)H^9w5dfl5zO>07ts=P4?6Q0ZwZ zeV0nlQ0Z1GJxe9&4*dKwPsKy2xQ$A2Dm_P~bE)(^m7b>3b}AiyCw_iKr3a|=Ybw2S zH-3IYrP6!w^8%G-Q0YY~wNmM~R63bTFHz}AD!ojlpWTD)(aBe+_zvCt9hE{q!q2Nz zI)F;Qr&2SOUZc`eRC=9C)9=O48&o=)N^er>8Y;a-rN#H*=WQxoN2PbDv@75o+U2j;`cKT+xQXYlhrm9D1}SslqgQHgX! z^4sXXBuPZ_W-5IoU)dKOTk;S2zP{*wrJv*eX%PS00Zp$$XHPvj%g(LKu-;$|%Qnir3n?48E7Q^O`1e3Mx(EKfn2t`u zzkj5oEAj8>{^*|h@GqIXLWq)+2xx$QM&f3&f_}!}r=cWSN&mu0Nx6#9zcIP?nBg(F z!}!?wX{=D59qgAs?T?PmGrnv_a}>T@MJ2|Uom66cnW7To%af_Z`0`9DF}^HN>6b*x zOQ^)y@){~Jw%kl5#+LU|iLvERsl?dw$pk!<3Sa(;ZZf`nl}d~+|3D?immgD!@#Vj$ z#Q4&eM2Yd`NGdV597`p}mU~i3VM{78wwy;L#+FA?iP$oqTtY?0mu*yHe0c(u7+-c% ziScEIN{la0r4ns%N}f$6#+MgRiLvG7RAOv-1C89_pUZ zJt6sTyob?&xa~_6@p1`xCa>P|p_~lhN4+kkYZHTZUuMIdM)nIL{vnFbQNwurizW9^ zE=lw%Y4&)<_hdz6kij3tyq7q~I&$-g_EKjO9mAiLrc? zN{r0iN%deplWBF64#900; zDlwKnpGu77FQXD;`Rl2~SpIe@F_zy#CC2iPQi-wrGgM-l_ysC4j(`0$l0U}sf25m? z<^Mt@#`0fKiLrd}bd(s&S5S$u{CFxcmfxF7jO7oc5@Y#8sKi*lhDwa(mr{vo;^kB# zj?X7!RAemQLnX%YIg|`6A10Q6=L!SM*PK+gLo9FL_&gZn8)kU}$LIfBD&ETS2V4mn zg5`Hf&Z83J_#^XFWGuggN{r>(sKi+Q1S&C>@1_!C`3#j9%b!Xm#`0%ViLv|zRAMZD zIh7d8-#{hC@^?^)ar^^R0>?xC9-|^-`DdxbSpG#SF_wRWN{r?IL?y=Zf29&*`7f!& zSUz+WN{r=4Q;D(s?o?tdzYmod%g>||0VlxHkoOGU==O;lnmzk*7P<=d&mSiYA^ zjOEu-iLv|{RAMZDE|nO|4^fG+{12$aSpH@zF(rQwl^DlAd{rrmjOCw@PdYhzw5R*|9S28fTWkIq=J9Jkf*8l^GBhutOkIaYsi0*c2&L-U^- z+hxVqp&197n!(U^Ye4=ya|3{ZpQbD`K-hTlR6K6@1BiJVy>~i(a_wwrZbNZu@|I%% zF~L|(wz#?N?1^35k1{pBd`YgeD?X_Td4FPrLGtT1>WR~%^Ojsg$eJMgkIQ9xCu!VL zK*qv!Hr|ni4_W1^sY#_n3?Iq|=cAM6GAcw(ECoUZ4`6dYhNz`Zx#Wk@v3d3>7?lyQ zfVh%4=IFTmT2wMM#yqbm=wFCAkHa85rw!vIm)B zV7lfBn(~-Q#-7+Q7Edy{;*4l@t3{Yd(pik`B!tO=)r2pwlb2}eTlB(~Th5L)mI%~J z;GF0>!R_y#6TMb&`^b&aKYwlMJ?`A-AxC^7>Ox5GEXI<;r^mE39;%9g{QP&KV~>4_ zXv_2*mCnLuc>skq%To|<(8B>t#8}5ovdV(z00#9-oo!0C$#32r9WS47Rx~`3RS@20 zSAo3YdpHq@!5(FSo!l!#bt~Fv=|^5ongTU#OQy}9B@(A$^9GpUP>La+gp?%m9%n@7 z{r{0q@-skqQ-?AwukeZRX^~G1J*qOo*W{O}4OWIphJX`<$^hQXFtz_thDnYAID?wm zQdBd{koo7+XrEcluqR?QGtxMC%lj6hhYRF8SBQ29ZvU|m-Roc&ka%q!7V!m8G3f)VxrG}56X|403}(Xk(ePye6j$E*L_{g_;j8Vr*9M@3Q> zp&!`<;ue}Tm_?O65sRv+hf|xP<4*jVwk3EpT9xUT#MW9Ar@!jfXQS=i=FXJQxF|X) zZ@CRHe%oH$Ld{~bLM2;iq=!Z6nCvuQHp6~T>eA7jo;@cf8ZiH-q}wJYo#pm7OFD2| zk#xz&%?$Z^ap+)k+wX|@kv$nUKYs18pys0J0TmyJX=l0ZNV|)QY)2(qwxfqdwv*i0 zuE)rB?IO1OCMZ<;9At{OW);D--2P@MMBJrOD7gzzG^o30irmG#DL2p{ZPs?|$*{QV z^Py<9zyLYw;;8hsnQqg?(U$!FW|9)^G_OHY_|#VA-Cs07sAS7~^iboyuV;YlKWF9~ zSQK;S&sF_7M*@3sJ@WQThB*ys*k0uB+PcPifysd$8+H$Ye1}qK)g%g`l1nd+&i#*e zSDRyZ37`Hy)LqH{WOpU&0nH$4zbpE%n5*_(S}`_@8hc_RY9oxgC4cJ<2-R+vNB3KB zs;HNWUaQUe#9A;xm-ClL7b$+K!!3H*;HR?G!u&BUZ0fzS$kPaUv@sN%Nb#Sd#O2=z zB`mdYo+*tI$qUWw@==jpm=pg%iVQHb3wvV2F8=`zDCXlH12;SJHl6mFUA%q^B$IBw}Ra!aR(TfPZO z9Q}<@!cq&zEgB_~^=58)x5zD6S>7dCCekhJ$$yVqes*(o?*;ukL!-&JM5`5UImDS; zEVXdlvcbqLaS^wC6O{P=H$n+ZEgZLKlt`Xs=9Wi`GfK8npQl-u*=}J^{(IbV(&p$M z6}OvdB!6-%q600r9cSa`5%$bYMkVoF4sXyyZO+(W9lN-7rDn{=Pc?GUY7rNG6BK${ zOd-qdZn#$;5oWiooGFN-qlfJn<^0>xTC-hLq@3{S|AunO&&|wteNnZs5T!URXq#0Vd*Z-+ zrFy8)-S(s-4#77KgEWI$}+3)sVxoHQ7ahtQOTD3=%L1aU&}4FkFq3p2vD{I zUR;u!IrESiGY>jw=FDL)$u+!OV3FpCfiBOF$5?RH;uPk{q->3z{@Ut6%=*b<$a&)Ib1Os3X_ zPi;G~liJ1vi%NF9pQML7Bv`x2|6hAo9vwxI#tnx#2m)f#B|$PALJkv>2!VtE3E|8| z41{pAfee{xLK1R8fH=nM=>`N8Unz>nuDhrRUaJuhH2PewzzQfJ2tim;QCQSnMOWVL zSJg8;eSAGmCSi>4kv}?J_3NqQJF32_`l@REn?j)E@KR^UN$RuB-I?ai3RM~%85e_f z8E*3ix{MxhxRGgm30yWu2fQKN67o3Pdcpqm9#GVCmThp!G8AnSEPJChStdRPsi`z= zw`?eJXWt=oiqJfcFv zIYxzp@UXi&GXGzZ)mdsGu2Jp96Oe;SEb3&jXxt2$zqAD&+V+AoHFl`OCsQ&G>N8^De+;7qGBrXN($m@z2YCK!`=jTs|`Kn5yZdeUjk@XZT% zvptkrW#WilnGlwZ+%4=zj2U5%5RF7n*@L+N)<3sL=!H27AK5GHYPO=jyH6Nsn`$5r z(nuFY4dr1qqYzhA_GYN44D?V@BZdkACMkb^=I{qs#rDVVl%1lM-0 z4{vRwFu2(o`M-lgvh8&P4#n|73W^$XXp*dv?9EUk8R(ftM#Qo){g8ygL+ys%USl?O zJLslbrvJs!TXxJCKPD~?8>zTph1y3Glrjnqy)T8cNUUZPUDYTI@Qg1fT=E4)+Ynza zwTO*J19UO=Kl4`b2w zpBxs}VST92kHW1;DqBAi)+5m~KSsCazmGEvO@Qq6$+5Aho8f&Gg@~} zNJYMGIwzb$q7VF2n21D2eJ=DvqMLk)35o9NM;=F_6V400k>~~Ig$I#nn?J!ewWfU` zlp$Xq_(Gsabmf8kb zkm&r2u$-vV&H7arg?&iD>&1TwiAX$sKYSX4}Ri_()B%%u&>^ zzv;u;V{VzVH$yvsfu7j`biwvuhh<(51;?|X4d1zG0%SIYIBb9$C*|ZASmv&J6zw24 zF(tpO#9}o~qMv>!#3P)#-nm@wTn=bdj$ZFvbS)#j-nn3{$M36iDf~zy=`c)kIyj(c z+vr?spx~|7xrpyTU!@wzERzQ^{5+FCvvbw70~wCs0~zf!fQgVY+l$bw7@vUvl!tpT z7gm~!OgU_bZ{Muy5B;BJ0^QFvs)XHY%_jof-*worW-T0bLQe#f83|7?zI(J#Q7n)| zX@w?MUou~If<OpdIbd5d;X+Tc2XYwbB-)$jew(S_Bmqx0g z*^wkiL{YLLr}E|MQyxAAX?K1v6dqGzjVjXG->`MTLE zwzWS=wcT&PA;~QQMGZOhI;#a&G4^JtVhr?9#fo2&RcxIxhfWP}SH)&4RBSZN>7XZ= zx!{A&COii>M_WO~EVMF=GK%59j78AtF= zEL_K}b}7YR86n<*O#?}!zJ08JIg(^!Gp${t2sT6hYApF#p9;;6CYjBajK*jZnsAm$ zhDT#1aA^y3-LNll+w5So=6>V)Z3&MzthV5$8smZ&l~+3HvQ;~ zHje1z9UVud4I=r7IKDlIgd@>SLkQ;T?9tg?%LkK@wkSqX{YcBcg9Swmo3*Qq6uhTl zZ-zY$13m0%YT}9gY`;_}H=YAso#1Xq6XEP=rl*3t9lAXR=Oto*gDp3WT>}KSteNPP zK?Iv7_ZC}cxnwJfwh6YDw<=r3bVyFsS#_3`m!EpP{xw8-Iigcu&Tw0p_#(WxoTLjJ zFWv8D;`ewt1U)@aHydW(snwh_XFKp2x)n(zJn4@Zhuz&+{%sPOA#H=r(G^SSMlI@c zp|!(FFR9OZ&M_&`*dK%e3@XCgw1&5L1^6LQF9VilQ<`( zxpSE~qL$xa#CN5!ll-N@9Db&M<9Lv~jV&Qo!UQr@lDvVgB~O=&qAPNo5dn7{SPO8O zYY9bM5TuKy(3Df;9j{PCmkUA>@lCDx%#y`tKNFvaF!AAto8sd;x}>6FNrfywk;+|- zza~Flrjy>ur&~ZR7;cZCubP8LB+89J`UnkA;2Z98Al)w_>as(~wxT1{3Kyi7(8dL6 zx-ypsrDjr2n_s+38(nq?ZN%@jBIV0&+ubRCZX1EJpllsk<;S;)t7Kl3YnjskI=sTgkpY$iAO63 zwRt2jh##VwrDgWTJTl3KWoUd1{5(c)K5@(5@M%@}f`PIPJ3x~!;La;H0vt*Um$K}g zN}{CWDQbZVc;YlAm-Lgy=**v6t-GodjWTyQ=WH`F#u_YwcqA*gperF>mzAfyEh|`t zzqzdJXjNGeABN;qEza$7i^H$5+lRF{9Kl;0WYuysn~y+NEkBz^f|16?Z#v0FqOH@3 zsoBerZ%ro^2}haSc+{BilQrI*1x{?dMRDh3jZ9yK8fM`q71(bF#0jWhiv;@N8t5$o zUiobZD4y5K=3v=Q-_7jw_c6CHj_61`2`mGxrzTh-{u3)JzyZQs?oki+%xn^!bn_=r z5_P-mzD?yfhYXf%sylS8yUIY<<)W}2s^O=SO$D{VZ7Q57EvCuMh)KnX65|!3=yE}b zA`XC}s2aGDvJmZILi7bDL>$ox5l*l6(=AqI1E)B_&Qw5VHoMK*TmT;LIT!)uiyctZ zaGCNaRzq&Yus7V!fG-&6Fk+hJ=Nm$s%*1}Plm*Idl!L?17jBjs8VxsF4r<2DQth0r z>g2r1m~k^6s?ZX)NroBo^Y@%<%UyB}McV||9=+CF6H6fnl@@lB%#5kPHGX|=6lP~| z#7GTYuQ$_%`6M)pciX|brq$Cmkmy2kYtjoClighd8EJKMYEPGoqHU{mH26ZEb%C|i zTwu+gS5gdD^UkfTC@Cw)yMPwZ5<1FYDJQyrJ_))eRBiMMRdu;=s&?UG;-IRyiypC( zv9NVmTp~W za$GX7SysV)=1l2VuLv|Rgo$IgwLQ8NZf<{Yt(fY7>GBd{@(G~p zOGvLSqa3mPN24FxIAP2k@NtGJx^q#{!Z{+l89chYiYGSOfzx57B+E9=5#*`(Ed*y0 zN467GkUxXs9I1>~VZQlg^BliQS0SB?tc8{eb5(wk10Soxsox^f+sB_qEFzO9KX3xb zYGaSJ>`~1gb?os7d#q!RHSDpTJ?>?X4$@o=@XT}nFhM8lH|a@z$#KGfmz3bV(6wS5)ZMb zt4OXC`}B#Ro^;Y;5(lvlE+#or>_1Kg^`bv7CVe5c*AkNF%Q9aP)t82*c8a1eEFt02 zch8&+>f6sO9tRW})jx!Q{8e(<;U(j-Veo3Hr;9Fk2x})c&~epd0wlVonhb`2>B~qc zJzq__!21u?q#yhXSxWlAzY$AGyp+e5Uw{I1DgS2EQex`Q%hIWDnR&h$u9%+()U9+B zp;BZ@FOlWqQgS;a)O{Hl0{MiG90t$kl2#WmQ%--F;xxMiF|M+~k?9FT6_h#nJ?9A-m>D~SB&FjDY%Vx<@ zBSwxJ)PLm2VPmTfdM-P6?8=3X;n`z`4IDXPMAdO42drEe>HGTZh;d^^!e}EXVa3%i zV&AKs<_u~c9{kKREmO~)9$xjcy(4lI-?uM**q9UM`}Td*^&E%q+vP2v<*2@I*NaYG zv9M)Jg>f%gS$<*36@Al-C{>1!&mH${(%9!#bZ=FB#TTt&SM*L#SkbIiQP|K-iG>jz zLgL^MD5Y17ZB-n`mI>><;#_(fyC|}vNP6?2r8$lj2^l3lMW^dgAAE8CjAt z`*4Q)aE3IjoK)YPJVxpzNj2Sz#!AzIrCM-hmQ)-zFV%`bUDlW?=@Jx~ zikAbXNnsMSoQ9Hfr%OM>u<2Mhe})t+;OTFFa56L4WsXCyS9o?@SmJ@@yaoBuHiGmxS%LhyVUPpH^`?0X?OZjut{*5Hv z0k3`~zhi=^l&nXr#<~VCe=DDc+;8Pr_rf!BvIL{fAce=|$L|pA-tX}17iW=Y%UQX^+>g%5?vKs| zy?Pxd;9n{I(>eo6(tmBe;az@vQrm8^JAEBlz)VaKA#{Mv#3s_`K=; z#jmZVs*Y73 zha83@Pw?BTQpkx_5Q5>7Lt-VEErpZ-jEZya>7Wq&U&wtrD0Jel@=D3w_D*P}C+?pf zgpHJyRnMG!hW~SSeiSxKhLsP(oDkKvP)RuQAS@c<9*3=zTckq!7Gag((&MngaA-emxq$oi!RZ=`q!b1qLVuc*LeJSX-Ii#Yz8NW;ojfdZcD8=NDQX%D^$Yf|2rX-=z^e|(x$|A2 zR`9o~l#$9nqC-iMOE!fo^Fu1g@lE02osg2S!J(9r8kn@YezfKhN_DBDNo(gvdl;cK zmU2wm5I@@PNTr!H)1*x^X;M7QFqDdNiKcM(9jT-;DC`;L;?Jq>9Z?GQlXp$-w>_HP z@Sr=ZDSa^mape2`XuDmA{5z9&+K<*-SBgu&nY3U1Xb*Lzf?TK>+}$E2%R$X}IIwOw zybVoZS+zp7+*b`{h$NTwFqJZy&#t*HM)5L6f z*trtQCc(a#W;2XX{=Hh?v+)#$!^sAvs@8A9%B@OspXVn zCN;*rwVd)$l|M^^?7m8KxLsWd^{EW6p~Ok@g)}n6&xRK^d5+R_} zQWE7eX;8VAGMfE8nS&FMy|t9`@{=^!P+Mul3d`13YOuelg~Cd}#M(+ZNrUyZmBy^> zN^Pa3kJh4&5-XMR(VT~ZO897(>R=zK>7y-i2Nz*9mJcI_u?!SDGzq(?H?O``S9v05 zHHXB}2+Pt!<>#mFIe7*QTQK=@8RcdEUpctuZFu)gnXTCQ1h%!`voHC3%y`aIXpW zORtFM1H5Ps8z`ll$K^rN4xAm)*kD z$Oo{UEtKJc1^nSz1+Bk z*k*NQx;e0TtMt28$`z?CQ=S69WSI4lc(_nySo(H|3RPPxFG>SAHR46fLcbO;b6%l} z5Sxhty<5O&Eu<7=Wh!~nWRrJ}#XG^n+c?YQz4Co%X&9BIye}=|ywY4~lBF1sHxEbr zMQtpF?tN{raZ2x)*uvW(`J9f%+VIlJTtz3_R#pLE6Wc}?7ebf@UsJ})qs@yVf|oJ zN98wO?>%5IG>4LaD z!FF%!f=NRz($cC|IiaQ5f$Da{IxPatyD6m^1D9RH$~+lpe>WvZsssJ{DlK4ScPqod zg6_(3Io+dj&oHHnWNLIQw}O-okv*06I1%mFQ)%GQ=zhPavR;yga+S(T#%y(TcW>oQ zd16a55iagy!PwD9*(%LpGFe`);80&>l$_s^bozB1LJio~PYIXu;iG=ok&uBm)YmLn zogg7w*&y#S?Ks|`6qa_~G5wWX3}s9IHvERvQbSuRQ4Az`%ANy=uLYQQUFP_#RIFk0z>f1G2`##oCYcTb1RZJ|ZkUmn|C zB!X^?V74SL@R;%rTDtO~FL*4~xIY=Egh}!mk8d}|V+VQNG;!%Uj0+NyA*iU2VeWI- zkv^~(q&4u%b2z>BDkwJr>7NPo1ru=Yf^=RxrzT+SoKMf6h-?=a8+OD{bJlz|oeYRS zr;`Ek+C-%+)4;0@|1xJI_7Z1PfzP1lPf`X+h2YBbN)=e&HaJY#Z~7Y4ip`>X$wn(S zfld$fmrs)4LqY5%D4!xVhwQ01Z)n=eoVsLBQ7TD&O;RtB6fqU=C7v@$V@1-oshAQL znWXugBsGQ96-p^sx-%qHY7E)a&<4W78E^)inWog1cA4DonToc1%rBX)Y?6+dq%TF% zo*BwK={h4-hEX#Wy~wpzEUOKTStuo}d~xN>l`Odx&$-qLv_m1aq(_Xi0wjDAQY=2> zkQpDlgeg#CmeO91$S^b1N3*akUSsG>!iniph;&87(O@<<(qw?yI51bs;N4D6%7D*@ zDC1zl3mAuru;B&FspZ(8lCXY3NSL(U3WmI;6`yxQmlu_q@}LYpgOvJP%yQohvnm{O zl&SI*55uGY4B>N?*JLnZm-@tNCMz-m1!hGC4~)Dq1Fl@gdxGinl+YMED@J4mteuB5 zjJ>922h|K!^KfYR^8cY?f1c7tzF?||peiyz+94&Hp zu{CHBlDm{lGj}EDW1B!}p0@BW9}PCG4ownhtSQ63rF5!n{oEn3~FswddS~rn#(`?HY%|YTX8Cnw^xw()L6bu0^J z^iT`Sm$FtIJXHbLxpwbXe(K*O^_gdK-jng81iLEAJ{lpKoYPpgP|1CE$k1_Y8GdC> zI8KIh-b%xd*V)G%?9Ms2*hgz@o;m64<2L(<3Br%sOhp}bH4vmdN>Made2=oEHZzyU zP0izG@^T&`=RB@=9@jaK>zl`Q<#9cEIRo8S_9&lA(;23mW(?wecGZbp&14_L*$0R9 zExY1Sf5v;QoXZR}o5tq^wZxC#*wr0&^*#G&iU~g_iG8@3Bmc6igY0UH&$%EREKTRi zWVXS}xy;3}>|-DM*y!{4EW6?!-(^>EECAu`V=#+CD!b~zK04zEyN{?|17_^U)bltC zB)44H`7f4_@6RbUSLQs)g6v&#F-d`g2b3a|Sj!^nHezjy7;7Wev54ht#JUzS#h+Ls z*&^1r5mPK;a~rXmMQm#$R=0@VZNwTDae$3TA~f7aBoUfmBa#Tsv=K>!7TAa+LaY3V zHA#eCwGl~#Hrj|JLhsp#BtrXaL=vITY(x^FlQtrW(0Ln?MCeBwkwoY>8<9llp+B(} ziBJ&SJuEcpS|madHX?~oVH=S|DA7hF5vpV(k_gqb5lMs^+lVAWt!zXRp^i2piBNBU zVr>$kAvPk3&}bWxMCf@NkwoZ48<9k42`9?`X1S+*p=_24t;dAax)0l%S}TRSkA0hrnfpS4EM?I*3`?nBL}N#l2qILf?H*MCHQfW zS^;pIthhV&S@ly1+7D4HxN)znlLY66s^y{hB%I$hTCO?Wn{w5P68w;(mUllMu9lUc zZ!TWqo>?~uP7lXR+%G#NL8*}_g}D8bq1)Cz37EDk!oscG&*lU3Y9>pc-!apx>rf?<>K5_ip-N^tFY zG!s@wFX28uje$v=idy1ksGm!)U>b5o&Qh~WNMY_#>(rYKi?+n5Ees!!J!0Uq1GC2s z8jE*z3#E!Hb8w|4GAi0V?|ZeqjM39h7|q*Cx)xR5I4)|jV1IPtd)i5mpa4ctH@Fz!hkhP zoo=tv)_pT>T2f) zOxuonkzJwf%iIPmycJOw*1hZ;DtAFjaYt`Q2HQ@}d66}ujxc(SbEZ77H!S%rq7z(R z<8=KGU;5sO$c8^&aVE*H^@a^|99`vCd&A5)N2;_I4y<$bly`fIcbUcVTJ);CBME+f z)!9Hk;UONgWY#+;$Ul3EuldOYz2?l6|Md|6wul=pI+{v6O0}WVJ}q1s*qhBQb6S^% zWrfrtaJ`eGnfvtX&cTvgt`A2kA!UQJshrryL=*ZJ2B(pSR$rvmf7@AE?(Cs;5NXxl zah8^IJ+x4jqoQ5RV@(`;FMBu2hJhTjv#!(IQ(6U9^3b(V1{Je)Y(MJ4?p;gBA@n4>*T=Q;qwR1I|cEDhoUIJ3B(ZauFd=@sQay zn;de^kZQoeL(Vab8^6Z6JAL8&BgCElg)^j(g2M#wrMO{y$k`AgL$xZ72EM-QNoDT) z$1x{KJs|$1v%J)cnW+ZrTRWP_-TFN(cMR^waz|}watixzCuC(A?1fX#XK_mebrgqw zhUS9e|8b^BQ&>Tp{4r7Cu=QJn?&UuIA7^DRcfx>34a$FmDR*%J+~>b>HWs*Bo_3a% z))&Bi=QK7o50Zp$F){9Bj9HfUz$GPb{XA|j^WqH~;WdG6=gAr$)-Mrj91`%Oe3a<6~Vjyt8bI#ev%gs&q zJ#U&G{iPc3UU}a6t|a|KOb@%@Y$^rkUvlQk|9A{0Uv!QUc#dDhFvbWxvo1MX2|RyY za@LT_3q0E{JBxXF)>kxK@ZhquzEsc4v*C&}+e_wtC11t5&GeG3KziUhe{i;EG3?tJ z2avTtIJd&v?X+^zKp1n)`HuV?_TsJ)ePHsBH~+d)^L%1Ig6A55p|&QZ_efNpB}b} zcaU<=!^XdO*^b$;7r1`X)O0lSkn8G3Cp@4 zoi3OW>BQ+IpAN(Dzc8^iK^E5CcmL%qA?TDJI;-RARjr5E07nv?!>G%R=R>&q#Q7cWRr5e`Q%j?@zLueNl4i(w>4b|*0xIbnbvM_Ed(`s5f&|^xpC3-WFc*w zujvgA)h0T~fPgh4MTE0}x6i5iy zc)#RemxpWPS;)~|Lv|J?(QNsn92#%=yy>iQXmx_{;4HV#CcF7gtr`v()ty>Z958tC zjH1rga<>7b<^256Tgz#X=F~J8plM}L3&-euDavVT>;|DO(|vBMflCX+$-AbNVB<$K zDUUeWuCQjnJWY#%veAf^GkGhh4{n`uAUqrv=ja%Wh{buDef>NJac>F8KDn?~i?MeH zrHEB2?nXtlRg&~6ixM6fET+})R*id5F>Q^6cM04!2UDel)&vj#T42;TtebC4zweaL z+6&wxVzo59T?ov*`JYJ5aeou1b(G{n1N{@$(gd^Gc%RvmplOm615ydC zGCWMwlJL+mUKa-pQF}RNAZX1Z%XsP6OKO#Izfqv`E)J!%XfM;*QpnUa0MoqET7t<` z0`5fNuz#ksmMnFFurgW|shcTqqD@3sxzoU>O)2^hG;c>Lm(^-AfYy-0y+Gvbrs#56 zGZOs9q-r^B1fGoKxVMK#*MlXuBf{ZvIgG=!f%$6m40*{wc5gB`&|iN4QM#g53-1A$ zhWsQ)L;1i!NIb1I#c?X7y`w6WtE5%O6Wl)X86SD{K@G{3wF=T*i+tN6vm2EXRj{Xo zLHo*DH$2v`qn1_&nw-&8RL*CDAMtRQ_Z`v_7@efOj>jH2caq6{XM!Uf@9B0_)qcRe zbWXujPfU-8l9Hkw!9ztzG4o)%Y8aiqgJ44l4(5GfW;Lw??nfcHB{&zNKN;0EmC>)X z(262`bakzdG|w)6aE6LgyC3nyNiV@10;vkW*U$z_o1lA5tmG|B)*Lx-hw5%ktr%Q$ zVoW^4#YBWEy~$KGm0q(n$(0ZxyO$|Hh~LY=C$+Uk(q*Qr8=hpa96Sf;U^0ANM|%-> zUpX1GrqBO#EHYIgzMi%NQ{KsXn1&+nK z5lPF>&}zx0hnN9L9Kxe3zl)$ft%u)9VITM2NfpC>Z%!uPY^@FP=4q&ssr@O9V5U5? zpLSVV4QV30n1v7~v+}x-5EflmevacPXu;%#A!df!)JCf=@ z+uLdnpw~oP6A`xK(LtMzsNGUW?LWcN7bfeB z%MoRvN;gw0A1C{F)4mCo&JgmF-h_O(w>BY2`jL@6{tnI7s>;6&fz{bsB}`aI7%bnS zF<;io8}nxHUVj|oencvp+ z%riLa-{Fde!;A1v?*QZt(I(1WII2u&J5(zxcf>D^aIG$XsMgN6s+agI8XDnYd-Yjl z8#dHj();OIZMHmRsJWJwpMznV1BY_7^71?n_!lVnTZ1g{XNRFql;q)x&(+$nFZ@=c z$tfcoPUPZfgjP8EzjIA6{7dd|ZMVGN17_?9t*o>MULK(hg4|(R0>qEh!V9xCyYV?Q z*<#zVq4>dX==TgYZ;X14)EWnsVcXt0Impp_WzI7EBW!+~0{0P5&&ZO{b9s0*Fve<` zQZQtX)h5aJhVp&xxS(g5nF>*SZMl;9CyD9YzcRu1nX(=1>RWbokzEDHGbG7A+iVNG zJeK9bwDH=Dbr^^|4qzTPo|jV;q0i&i=5b5&xRrT1Ls45E*Pe%(-SwZ-&Pvl6)SLqh z`Equ3l3o4FK5nuPjXl7uOGrT#U!$u1c{F9*B5z(&KZ zIuEUdjfOjQ9$Gtp8ru%{(0bTtxKHPyVa^tPsDayb9@+>S4R`50w23wvZqj*Zvurfn z#q`h?+Gti>@@+ILF6(VHD=zQaXjWX__ouN{5DV*qoX!{+P2?9|dKQ;sXm;7(aR+~AC}P$Vn5oKFDfg(YBHf3;E!HI5oRy5UZLb(EvfFf8%H zmR5&1*SJBM6?+*Qme(i1(V{W&@Ji$GP)95kK@f@3c^0PxXfjn*pxqLj9u$MoZ?QW* zyp56z4})dn)KO4%saD!iUyvQImX>RnTk`Dr?j?N5H_Y6$(tsv)?5lek{% z0+G3Df}^*;3YfiAt6=TV?rD2dgLLCY95@Sk3(r=DX$3B~6=qi+id|aRS z$dZHyD{w}h{4L&qzHfTnrj@I<qPQlQF@)Byev>I1!{YO|vm49) z)Db*bv{ozOXl|45wg%<#S#l?ouBAJ1eTB<6w&dmahXLM7rOFu14fzob%*L?F3`?Pk zF)qmZ9UJ6Wo95h#dIfl}PK%XBnM#KfrK>zjv0g2uyoHT+Y_O^NdL4qqr`n_7(Ry@p zovCk)r4JkDvJKi~$0s(?PE`=az}K){A26l%2~xaCztU7M>iE$ndLRi!qnhclc+1M8 za>*{bG#LZXu_g_6A>QWgOdVT$@dM%VE?(Tu*%E8?A!lh>htJ!qnYy!A8f`NDQOI zm2`L9x6$u<={~3AB_qrQ&fgDd9UR6;zlxEyk>+|~BwYDK>+GmxV<=A&uA|8HtBvc1k4tc%-!aV1j!}L!AStlyD7sl_ z6lp%CY+MOGF23x~+fO|kS8X2`Z;C@tAinlCt~QjblVhljVUUl3uX1{8U_{f%LNGlw zGRZN|Ci0>wGHaCC!Qc50rh!*&Tx%^Zwt2z5*k$8-Pvly08vDvI8`qaUF79jIx9ICN zf36}|umm3p&KZlt0X{mnO5&$4aE1L!9Y+lr;R+s}Iiq!Q1dsNsB3T-3F8luLsiUMX z(N=LATM>)x;5y7aH_zfswVI8ss>R0Z^6EMCua%9hg~f&rR)h2BwUUm0Hn!dto8U^> z3tBtJ3>*7Y5Bp?1m&~#icLi*0D=jweO3h2?%2pfOCW{SSsR7k4D3})Wf zbQt}E){yYo=T47qsZAWwV@Tz*xl{6BobsCsphMj`{O+!b?Z(hWrFLA?ZaGTP5^D-k zi59adM8&MCHf>avk3xJ=bsb#1ro9TcmV|~vTisPiG{I&B@d0!R$nIWurMv(7S!*mg zT2Ua{R}m#)&<(Azqa7_is5r9V@eN`AC&(4QscjWCS)TT%n)qaC@h`%BMbNb67o55a zC47oA{Z~Px1=XoNQd8m5uL8Pe0>i}UBzL7-S|`adTPSDVZn=%#&Zou9+qmks%Alg| z0l(osny8KAT=bSoC$%%EUXJtDTPmGJ3pjbZ#>3Gy1)pM?WqAnh#1(J)l$waD^!?bI)!K^L2PCGKGB!^iy$koo#krs29 zjcqWgXnwHk15p=;84)7DY(=&5t}{HuMZ1m-KZI6GppS$HkzvD@wD?$tmg1#c5{0J5!*0!bNDyR_0X|_8nL?4Mq12s>)vo7 zih|tva92N3lj&14TPW-(K2u#Q9S5m7mTTggKV6hBMSKz;6PJG@Ji=VCJ4JQyarJH6 zxZ2XD^*EKvK8Ia9Hq!N><0o2tCDPTIB()zXjW?i0Q9=W{Oc|C8<;yMU22M}|SK1XmH#Z1V*kHqO7A;2I@4-lpZeS@yCyFUVpqX$gIQ zRYKgQew5nEup%tHgkulY!c%xo8CRO)5H05EvA0b*6?K;>>*^?r+ISK_QC?``A5y)% z4<=U-`rrqY;Yzq#6{c2jopt;m6te+wc12f)<0dU;Fw6C^!K9+TIWtWARodh-w_+#* zm=$Q(a+z9Uk#z5^?5ZL;!p4)X$m182P}??`#cH_zbaWHM*qppkO;=+_Us}v!S|QsOQ!47K?1i5|at+tvqMlc& zn6*whJ)yR%>r74ZP}_N^4{RvUfIGE?C z4PAo*xW!ZcQEJcU7Ri=^CZ6tv5*{F)?l1mNn(8%iy(41c?R|55ozA3%I=a%$)zZB_ z)m1J;{%5>-9QwjewJh>AkxdIrm-Ao+dO3BA7* z?jHpR{O0bcxyZ33-PMf5$vz+zZsi)`DE}NuXM8et-8LC(y4HV;Eh9*=A*x7*>!#$W zMa%gRmG*R5K9M_^>6+?jLq*N8y}3;`gED=P%z6yPkQA{@?)j!%bkv?IdVN5xDy={7jKDbv|bIP(!o7@YVl zyVx!&#?!Bz$5vm}ySQF;T&0pGs#Z2s7b#P#uCDG5p~OUG7dubII4V02Ma2h&_$+(# zrQODj>Ed@;c1&QIqJgBs=+&+`k}EoVV&)2E0_mVUS6u8Yvp2~5xC3@}YcknRRWXDo!j<2nSL)BbYaU!`$U}5V4<8xiR9qVW@ zuf3kbU7Z|n(PFHJK|x>M z*BkVM6cZ2nmKj3O_frNQ^ma9IxSR1ZK8L$?hS2x~7(CuiSUA&_Ljq1c*sLo&d+##jy+bt3UaLbl%MUYDLEd~a;y7V@5e7- z;k8f=*5}m;hUvfQ^`xoTrslUB`_N9{O zXODPB@@xCOC=>T}+RHAw9YpQB(LTOIAg>_XS4+yC%rb|n^7ev^|K7@Ld<)ghjVHe;7=M>CWWQ@9xhK4aV?fLoteov+ zWpHKQMW@tp`Vi8REJH`Ao|F-KcX*ni)bV;3c>G3z(FIn!Zg|s$Z)d+#keBjkua~z3 z<1Bgi>laCL)J@93tF}tN$S7Ff)ojMao)YYInEj6HdAabU0y>lHB`*qN66xSP)?2^T zDGH4;V63aS6W?`R364o-p11h|J#Yij*CjX*q|xb`S2QyeAYR+=TK0OjBX^puMi! zlBkalAe;9Ker}|i6BDDFLgdFnU#ov4S5L|D3zg=3(C;6@6` z7swwW{H(XjvvT3s>(TWS0`*d%&-2`{*vt);zZT}U;`xcMaWSVY#mw`!`>eAmv9jO8nqzLi)Ja3R&}=`-AHz$??qdBwg^_*dPFAL7I2Y?YXe*aD)!dpIkMF z8KJ*jc|R%{=KqA-az&^W*6HMklYzTJ8QCmxF}-Z36WaR^NHX%?T5g3g_b^X z&5<3UQ~Xj4jFg4D5ms7d$maMlcD6z0fJ~Rl@QylSBH%~0Lb75bQP7_NB%ENKV z!TO^RM-^JmvrC5nkrkvlxcj-VFkEl`RB*9Xv=m2kYJh>O<7DUc`D5yyQ<@o_1DI?| zwP4jq0i&U^Em1}j(8qNo5zvn z<@B14mhw2nc^qCIhnB}t<#9xL98DfalE+ctqx@Vov!2H-=Z()9Zyp>s!^H<-9xncj z77rIcaLmJXn8lHY%Qa>?Uh;VH`81Cgzw74l;%9w$y!b}mKDP0kK`VV%C$AjKCOd`8Ki=8KI~*hb^A z*)c;aZ=?7kgD+X~*JwSBUDKad%@-N`*&#oO3OL=Gc9!ZnYBp{bI3|RJU5mX&cRo$t4@jipdQd&5FrA8_kNzV}F|Y_KPcf(KoGv zo&X6Y!b3?1t6_yj3(R1(m*RmLtVUBNFoPATsusg^GvDN#XConhl0A*3KK@)fd*C96 zKVCkMUGe9`*N9MN^;8B2 zgU^UpglSdu64GO$1dmzZw8IKWvCpYD$B6cMVX z`U~-g!}0gbW93F3q59ML!`=A)g!VOc^A!h+>c`-3ako)J)dBwU_;7dy_52#<^!8YuT1}5vXZZ`oy@^8jTj}`Aryl+(9;1_((JnBpQA|xp_(2a<-|&|`S`D9O zpI<|7B5&}>uA{Oe)Fb{9*)>oCe}LTq`|s<8pkggOT>iu(d%%>HCrvkB4uH6=C)w*j zYHfXldf#8+Z#DIbpw!We%Xd9$Zqt}`f&I;5oUZa_w}_eHSHYKR>lO0P;bF7!b@jsd z%iwqG=}qy`V*KNI(E$&97l?C4N2_)GrPJ%8GiU1QMe&^$T(X)aDcA9Mf%R^A@g87v z=?cDA9H@i+b#ASPI{6z825_BnzZtCEjDeRMV(q=?&;CjSWOp^v%Sy9M2-69KDKkJC zqjm;oMof%11aJFmxX}@7sD9xu+P^W1I#Tr{>4+)%i6x42 zuY;)=%zo^TCz!P7viBpX4@8Y+;aU^Dy!m>B`=-XbL?abf(l8oDXA0HX2EI6pKM;2R zs^Y|(D6Iur*V&%{)_t#7Q0t32%0l-vgn&P;5sRIU>!>N{z(kqC2R#@yTMfrF#tYWZ zQZ;FnnHTP4gcpYsF)?a5$xA6m^+f*Cg;*>5dSALgGA^9_8gG|-9#dlAQhQejG)mWt ztK$en93z@l{S*}ox29ReJttbN5nwbCYCl(xasQgGKNqadBE@Xrh|7xo@1nP}^ua;u zQa@2(rx>FHTTERX9Rru!>sQrv{!%@Us|86V_Q4;Qnch*KBgs2wvU}9}z>YqWka>?Y zZKGPphBuKXUO9R|&RX(wmbp*=yiW*6gsU z_0TKo_X5bWF)hEXBO2P}MMl8o5mAOeggXQR9|1>x9NEC1iQ>HS2qU$&r(T~#lE#PC z4^;1^x09>R;v3xF_~7*ZN-tekE6=hzBnG7iKjuV~RUmX;EqE%bA)ymvZZsAVBzF-F z#rt|s#w>OqiC0BOqLo|S(q9iJTH6siw>r6hfmX2z$dWv@K&#eqq{&39X%$Z+5vvmj z9es(1c0Hn;?q<*GX~FJV&+2`Gqw^UD-^u4%&6yT!6GMrcmcbQ7p|?640y_g$fK^tV z`6CN*6?ku3JyMU9KhA z&N($vuST4=jUw#nbF~&R!N!i1pZ&*<41Vy*GtEalyt1nkF`mogChH%mX`~bfk2dk& z_zp%{c&X^!CDE;5_46p&jfkc$((&gArs#FlzNGYSbE9os0DV+4d8+=VpbCf8Hjp$; zXo#JOQeJllYDScSB_dM?%GATC3Kyp7U#T-lDMxvFx}K)aC8c;5)h7PC05GZOvKjhz zK^KSl@=O8dRHD^_`M1bqgL&shqynoWg5H4De@2#2-yupk;KsA{Zt6Bt$^gTi*@9^* z74?GthMXmK$9X_U7qzx$be+Z^n7~?swU)LfCfp@K+7V7QP5-(V(04s8> zT8b2~I!w46Adp0A8c2B&MG|NUYP4+gnMaXwkeU$D7>+F#s+U5-yYh5*jU_ssC2v9? zAf{5VZmB*=ZAnUb;~KS0P)4NOOP1-fsE4<%o67}>hD4*)x>~OgnCej`y#IH$|W>569QY^S1xnN_~_%lJN1)`a^)Y4JAxxR_PDCYIvmu5*S1Td=TRUar)t`>i!n+ z=i$_)L;(jIzFME90x4x+D_jczmWZ<4-Jib^8LlJ*?bFP5NEm>}x8fQhbc=`;A9z?G zGTCsSU85J1)DMX)$DQzsK2H6Zl=3!k>lMKwm2#}pe-_kmh?%bn0v`~C7Q`(g6NTtY zFyPmiXj|spD{H-;uKqyub2kpH*K^dHq?Eal@N5~p)U&kz9fA9y538TB&9sQQ{NDjQK_|W=ywD)yb<3B;7Ksi>I-swAWn&Lve9wn zTlzhyyus{i(G~*rpgjWr32I21tepM#CK7`=M z?Oi=aQo9maUd3xR;d)UYQpzFz5&$9_oKsaK;BxO~G1icHve* z9c>%D%^uu}ZFUwh!X2u$4clxUDdlbURse^HsM%(tX^F4RLW}Ksr@(_J@TuEh+x6en zw}?fK!`O)f))rFAo9Mosf=^WH`s7n1UbEddkE`fgSJbl_m9S zQp_QS@4^6{CZ!zWrd@(XBIW*am#&I>IL0Iq<5xtZg>i|uzwFwyHu7=zaglEE2_d`QU-q-!J0Yn zxlSYz<4rVdpMFz~C#4K7wB9d7m`d@Lt(W)XEH0Ty@mX936#rP5#Z@63d_uC~W4(){ zrjcS^%`pe{25M_kYF6_>0R)k9e|b=EBI;q)9DkiXf8pnGDpBjJ>wuh%C?{)_+F^Z@ zq~?-hj`qf|MB{v>n^ zepkRJJEEZa89j+G`Pq*ST~*KgLCI0?t2JT&8T})<)ZBuw@B`mGLnO?#5({6PX>(S< z5=-d5#Tob7vwD~u-GI^f>Wp=OoUe}3l`zXxU80J&DO}iW2v4!cS7)3TgrzX*HeZI? zbU|+~4=B)6e$bqI+K-s$o;JQBc-ouL+0t+HET4PUX?3#r8)GeMT1KYNHQ$AU>%Wdl zIZ?PCEaj(yr=dNn7*lY4}UASM`bN9s-DWw=K5r)-)o>mZ3i+VtiF??+YnnG?vm6q{<1Cpi!7fewY{#-RDU6&Je5Aaj>GNmq?D!73h4n95>ax8H=|!r zKWH++Sx8XLM}QGGg$8$>7~vJlSG`SrizEiEnJSf9ToV099$7S``YCW_TJ9>fO<9c=@=@x5m@zNNSIE>CNB zymHbB7S`_MLpe)n;kI(Hs>ja4mp55JJ4p2dR8yEWB)>%RVyV2p?K;vaY>*IIEU7^eY$|5uei z^;gw-42C&E!I?kx8S2Xf7glI9SP~T#Mv_lgAB1`H%=dpaZiuh>xjfJ)1iwGPT_I}7 z(#^xt?Jxa)kh+6d;9ajrfF?tX$x@6F8C1TNSZt(nSGqy}NboB<&!!vuoaD$GgKM^_~!}y*~*B~P&BswGy zL&C?f{ylvN`ECxn!nMnc!*jgE%DYc7!SimF@Pd3HyC;?N{DY@Opg$FeMeffnRq&@Q zq4SZdwa6O#-{_Eh>JfJ4#wlFxwz-`r_n<`BSBr8Zy!OeqoBC^0B zvF!eKohU<6Cll4YbL#TKxRd%ZDPUM(t06Q6D!^gV z0xjC0bNcs*E+05J5G^RAxO_PrD#Zv)G}=6m^oucaWc3GP0Bfpk+=-`G6rXH&D{9nH z?-EHqp6x4YWT=0UQZ|%9zhVZ>phU{EUP4Q(RJWy=K}Qveo=+l@iW`4QYSetIpR;xb zQ)3Hsnj_yCYdojM`bovV_2B7F^Qp&;IAgq;LPWXK3*wCyY8_I_ovxT5z@SpC5{&MG z8a~cUe>;+9za*m2s`kqwQzgp8n&Z2Ph_E{;UB##>=;H05caqQ!XlLcq zj>AdDq7eDd`NGIaW@Lw&V>tMX>>dTgZnIC+6ja|KbgZ_n@Jh|64o47D%SciSFR;=C zZ(aRr8MV|nQi`ps8=2=30Z*2qB|?_Uj6aoYuS^;Oo;~i=HojBq6N?Y7Zv@yp0PmC#p`l&eZhDgqSe<=Afkc5 zRFg8XWw{;=jOmg(o)mL;3pGM_r;t+a?kkN1WmIZ!BO_i=!`%&QY-Fm7i5li^erkbQ zjy5(1+O_cBwEq9EB|g=dAZX!Td*}aO!Gxwp89{+ncZdJ~S|&9!dJ0;2tyM`AydOjK zS+%xUWb&)Es_=c9u}1xoi1JGNw7Frxht07Y{Ypr9hJ?;js8pX8Mw+0;^7CJ!fmIsx zYboHSYNod|(gii#(S&qC;3`q*b2R?j04<+-YfX}V71hyh%QIKURz{vImtR;==DNDV zo4Lv?v~mo;0qB_}AT3Ggh;(HEcfM?W%>7B05iQAW3$*(CM_#L$L@T$N+(EF~iqQFO z6g#Ko{->?cNQxfHX#6&c>+{^d`W=e1ObsHcc&PbO;8X1J+l5gd;96F~n=TEuwH59@ zXsG#&F0!LhT2k|f6+WX|&`EH}Ukna(GAgP-qJI#S9r&AQIQT$D(8vOF#E z%knNI#5QD8Fg(k@+Qm4b?jdG57+mOSsvaVx3{1QcKp_?F(hZX(E#Ud)8={71`TwS7 ze0O8EpoVp$fW|*0s;#~hNdE_vvxjj&{eu*6m-qL?aq1sZ%3Y2L;4&3`u9q=U(8aL^ zlKP#fwy*}`q*#6ZKA``!HhGIo?_;D%YQiGYvDrOFX7wmKSEpip5?HLCA>ny53Y*xyp+<-V!v+}a6cZ0+uMs$(UPbk=m4&haFf1ZWKj#?hy=r(UFApow zFg`&R2lxn%=1$5rCd%sjL=RR!9YMESBzN3MBU)BTiFw#);b_6o7E8i>9HL!cXBoJK zOk^fY!Y~a!j?qc5I1t^I zk<4{WrY`URY+1@Anydl%DdzY9yypV*Io#0lXA0vlb(yEFE3=Iok~)eQ;hp5ci$*(j zA}Qr*YyBL-NhoEO4QBhFjTcfNxpXkCg&pnR}Or@#y`P zXS^n>dq^?gtoUr9ktBb*7#=LdS;}XGYMp@nWHC7b`A_3BIa|Iyk)pr^gDsX2kuJCjSlKVQpyMJIRQdIMgLfCbQE+k z7%Gv1lj{r}^R6{k6QVDwzY@END~(NRu_aay=kOk`H2SHDq?E&(8vq^^J-G_u(E^^g zr{)W7xG>S`g9#T!CW;WBjjalm0au)w5>XEGBQWZztw(~(WI2YtdJK#ArZye@}7)sb0??{{xTOn zHcjY!tcnk$WC*cj;Sa<~@%tui&~L2}4%(vm#9-B0<2y-RMXazD?fBHEc3;n^`1d+v zlKlD-Yg8nC@3o=czDG^+ZXdSZSS+jW5tAEUGrFmJNCBUv?_6RBL8V5&Zj=(#@cO^? zdVvn(P|^}QzF{1e&vMX##^*!3ytu!mc-TmJXQ?n!{)f=9X1cF3`Sfve*7DVuZKYTYH4KzDMXB#4yza;RHWU z3WCovJMT5>O7i6bL6qF!1#y8G;UFF#5I~$GbgSF!+;41-e#l7Xi7%!uaJR86{Yfjt#;7ZDXYbnSzVIfvo$;{ctHciCvjz9>|rB~)VOUD_s?PM zc-4qre!3)e25fj%qo+%%5*ChV&!@)1V6_P;=3Qyrm#8nDl=AeQ_M0zqzR3y|jrq#x zD(K>qzn8x%ntq)OY&HZ{TBXhs7>yZ_}c>>=Za{H)JhKxVlI`0d_rn}lP zV=X@ML-aDE`A#hif{q&%rRk7#95-ub5(dlgRBHH(B2nD%kpF5k|Af(Aw8_&}lT$)9 z(pdAPIsTM^XNunnjV+1^*;F@)ckYF*e$5x^%;3ZuuGl5>cpLbZJZ|HX2jyYI0<`$exCzhQG#UkPYv(hHr&jw?tIVx! zcLle45juBk%3UEn+RR4(j+r8_pm-iC=QTWsDCWaJ{5`?&Y(nRTpVY^A@+n-xJz+q! zK}fo9%#pSi1mRR0FN7_`1cxyH4*|lvgw7!3|5>2xd<Ugi=Z-`=U zc->!u;gf`J4S~P@Wn7NF!$^Dx3$B4E${T}|2D44 zs+qR1B`sT=2% zk^o=-bVD5zkY}W*?bDd(g=$_v&N*If?c<`G;@cThibPLTHxgFv*SAHY>!{mFDfcTf zfM5T|-j~NmRh|8t*^@zJo0%XAL0k|40a*mx5(0!IKun@6u9IX)=4LVzCK({OjC)Hv zZZ&%A4(@B!ww7vJmr}ZF6>V*`+Unb?tzFbsTU%@WeV=phojLd3bM6hGzWw|@FMot{ z@0t6Y=RD^*&w0-CJm)!g>;1z2tY3F%nB1!OtA78;CE;f|K&5^?ZT{sA<03x`zbX9X z@65kE=0KqQAK@=Q)OvTTTI$_0^_B6FdExO_mw146F)<+$Q&&%j%nTpGRk42VC+O$+ zt8q1Eq*5JGg;5)KwZ^uUS8JoUvnsMSyqE(6MXWBooO=Pd$EqU=!ww~v!oAD zheZyve{`nvM}IviGR5=}X0|?;YAi_oeo{mYU%-JQq|^x$Bjn|=;=_Ju7_Z9OeG}F5 z4v7S8pNDVc;8|fH``4+50zGS+;RjtlB0}&VmHL#|`czh_R!xq4Is6<4h>`du?u8W{ z&vRF+mmeOPWBRHD`k5l=_!Kr1w#|(+s_#sRObh$3vG7X$5s?MqVcc7^g+=O!NX4*H z1ib$mZHD82nh74yH5!wAtTV|IQzH+Ar*c484`y>O_#Vuh4knqZE}kAa+4NDV1BYAS zcODseJbW?-j=?{jdtrlrvik2CkrQkmshMduCLw&-4-F=Xr+icOv;TafCHz_}`0alb z`0*+Z9mbCU`?hTCl{+97?40&3w@24jc7H`?~)k_Zc~twvSsJDJu)V!@Xw1vTO-7iGOf!8QcA3 zNrVrm;J?~;Y2<%QV8H1sCPV}%{F#Hwly(-zFZ$(M{4dFe^N+g7!{N#=mzag&`r@)k zO?U$LmT;XA6Ew|#dR2YodJ`;$u;{A@2fBo?=$AHx=Qc)O44=S#&k#OlIS^jTy=4fG z6(G!i`uF7!sQe2;SRKC7G$hAzP>Z;#8S{@UXine$?Uj+qW#KOFTZZSz6C<<38SX7( zf}2h>wa9(7{W~XOl`z3!IDVnU^hJV$Si;frBOXV?u{4336+9!edGihRh}K9w_Y2#+ zd9*ds8orThoaT+^x%1T1Pm1ss$m_VDu-($FCq)MR;d{B)5G^azD^4+W%l6sMQzA$E z!;f&Eu}#*Lcx1X~$CuT$aVRG{x!RJ^@5wK7*2Z7HSgw1TvuzRH(|p*d3SH1}ZESzA zEpn32)G8Z;(>tA!y{a>k@`T^w(7E#5$myQ0N3ma;-*@CtfbsTk@wP2fMaZmyXzD<-Xtkr}YsW zw&LDsaYB#wnZBszAeJVE5C)>v54M;X*9%^-K7o3<2FTBAhQAofKl|3fk=P&dhRU~ghvV>a-VL$ zeRBj^g19%DBU|*@l^pC6%A#NLC>!!7qFQLrPjP13jGhzeEek)+wZdl2kFM2b%}=J!I~@Lh9h zV3AW`&HrO8wMw-gPKjXe- z-CZo`?k?^}#Revbe#y16eWFV0&>y*P8Sx%wh(^)KchabPLCww_3GkpNY0 zWHd+CW!FY-^Mw7^m!=TI;)^$!u&{mh#0`;OdcvbEpG~MAN|T3fjSPCi6RqDix_nD| zoadM4sr9!;_``d=BF9v{och$Q5#DoXHKwZzy~wF_0@ps{t%G3w=aMI7?d&QybUM~+n=y@lBx7^(-^P@OAi zuARe))+6=ByCbiL`?Dy+#8wCDbY5{9x!d>tZ;#AfzU0hRoC7Z z8FxU*`n09=3oQ(d-2ryygr~c}hHP_e{)Hj#N8BH&Dyt_{WaG=2T*`+~X%c~6N~HyaR*p(9(1v=DO3zT~Df*+5X8#R(cNG2ZmsFZhB55xDIg{Q6 zs3Vt9X>y*5cT=$*rJQ117e-y*!n(bMb$JWx?iSY7Ev%bcSQodj?rl-1Yq%8*$Zugi z+QNGCdDfFHtQT8Y4{!w;gG`uj0oLgE=NZad7`j^+s-I_QZeb{HVc2n3im~xvY|O@k zUChG9gC(G5ZJ(^j(vdHs}y|Q>+k>_9b%F@kF zKJ}r8Bc{8-N;f+Vx7C$ycB-`1m2P%A$W~Xn*=ee^j=KSDuC1Pwz?8sUa-}b=<=$qu0)r&Y;`5Nyl1V`ZUWo2 zdE{Yb|FPAT=n_)Tf6TPmE(VXG_AWr3}(M3-7yU5PF$ zY;`5NthUz8Dcu^^W~(dFC1tBC(PaaxGp;l{mDW*qrxsoOqOdEoD2>0uF1{)!am;S6 zqD-C&yNRlvO6+>66qVRTQIx_^VYfd~b~T0F<@5}d*d0xD>P=ynD@`CF9Hl?YtLZPh z^(aH7b@b;l`g1Y;AtStUC;fSj{``pkkk+aEgG!{-DeQutS}H9(pT7jHj$MKVUub?O zc7qk_LpvkC9b4|(XP3jf5ML=3+KL8h4vo7$A!RA(`U|_e$-&j4D;ym$}OmFI4qY_hVf8q zCwyINyRct-rX$gl%w{v$GkQ=QB*cEKH-!ILL*Wf*=vjw`?g?rQRiL5pZZ!0o-Vpw$ zh7Ng>DEOvBL&ppUmA3vk5&>%vC!kaiQdhi z8;k<#!k^$4zn4+#8KJ$7DrGnAv(IUR4ZgUscYx1vmvTLDvpNP z;@xK>b8)V#{IHN#(^Y~qN*(rWWMWyFI{n$m0iJ{nNE>@oU3P}L>)FTz4c2N;fd_Ew zu1F;Se0vuFY_S2jL;zr}`tWm+NuCD{DEmgosI#AkQLjvGd>&Blw?Vl_0A-%~&qFK5 z>0SDn0qf()DD}yoF^pdN8DQ}L?r*1H7Pm@(py_fA&yq5RhA6_a#aPv9Xhe-T4eruMuD$j54sIeu7hvxy&m`bv3~ z4kXS4`OZs`0|4Zumm=dmr`v#>!roMSR^NaV{oZL{>5LBxhqhmaG*nr+D1p4-`BS(0i=Hd1K zhB5p#zX57Lwn6zp5sG%<9;jBjs_qP97QFUqWR>S59_p7~i$u}kyI%wC|7C;v4;|_m zH)&8YH;Yg`$7%dTm#6vEwPT|T4YHp1+sIhY+^xhrtJJ^#HZt0C)K=>(pRtu2)tTzo ze-}|atG8+bt?Bj1L7tP?Co|NhDbdO5Q?Ex>dYWxORc%%CQ#|L|zCTO&{kgx7%=g@;f0@0ka=Lo_A0i{vac@NCb8LB&@Y|_xMHYB| zpnvuCji_NVetu$TZKl^)E;KJu=Jli0#X7XkV5$ZY3F~oH0&)Hyo*{2=vCYiFE&< zQm!8OkH|xw#;w{U0c)2g$uw=~v`I4jw~^u`VQY2h7EQo@{CjR?Jdmx7>cxLI@tfxs zbz~wu- z+4FBmm&LZvYuKA=&+)hL;`0mJ!iFeL@vwgXQRIMfL-P65ZIhx=Ptpc+T@j{6HlSD{ z+jyfPNUVA4+dmKQug~ZkxQ4DSi(>7)dTjIn&$Tx2*XZDjkl1iPHYYm4^8^P&{bE@Z zktFK2GC=u`4a%cMDC%n!VO^1!t6uMcPWC$<%zobJ2!N^b0?cbRFuyL6pY`XT2kg6l zP~&vJdVWQ8sR{-drFQ$Gv8p278hj092|ZEtTclz!?Z~5+d7?*ia_D0n!+#qTWBokW zADvWoI0ySfe{^hVOs<{~jg}qCK!rv(1AUea#)!?~N$TVKMFXDb zZAEc1N}VtwN?O(NBS6B@YJ7C`Xl|%S(2E+6WiW0&E4;5h&lYf?Dgu-GGno_{+XqJi z+&l)3820bLp~mbF0ZyCb&`fu^KCr&|CAUr zQpYLJK~%fPME6Da2F3u=*;YscB1re;!v>^VIY^ZilX|n6p5Fd-nGA5lKIF8)$(Vd0 zrm&SkPcf73Mgz7sFzC8$U~n^SrRRCBDRs(;(c$AuO_bZ`_p{dKp4OVfNIiL87~!z| zr2JT0s@US8W>_AvMLV-wj#{Po0WmFKIruOHAmb)ShcE0&Ei`t%|Ps(VuO2q2|kT6XQoxjK11;y_I~BsvqIPCW#m{$&I84h@y#PX2YY3isr&VBiF4ckPn&a}h=-(Ht zpMOsDd3f6BQTkf;kPhUV^Ei@wwI$~(R$%uUz)W)Qly8qV*0Y~$(6G$*Wm38RWG2lP z>l5Y!wdWW#jxf)d2{U2=R;j;h0QL({>g&(O`xBi>&2V((0xYk8v*PoPhR>hxgTkiK7|rjCt{0q>lDEJVxE z>}yVWnZsVFGhQgK@XWePA0KQ1=c)PQq6=VlVjxBzADvm8%iac`T74kK`BW>gH6mb4 z$-Q&V=#e^B7c!6_uhN%EPh(bOXfaQo0PN1Q0zAhAP-AcOx5VCmHn9}xH+m~y(-0@k z#*8=&M>si4G~}poKi7^K}%L$_m%jcjDB-Y5Zf_aSqa-|i>Wd;z9is&g%MIJGdi|3i#vk%WFanF7{UpyTz7(BxEVeNtX{GwK*K7XVYsm~Kt zr4=|2+=-8X0^Z!H6$$4jwIbpCx>h7WfxFmmVU<7wQ;Y{{tpN{{0met&2QC@znfh`S z_e_1+VSJn?V!mlS`S1|FTW`dVhlSb$p?<1fWR0I^JW%7e84m>OXW9ex@B^(#qk6zS z3~1<*!CYa#bf!PimdSMY9hummUUyP-cz)h3s7AN&%hh_3HFLJ{K+W8zJrH@G)rv%( z_p~CBCvq!;qLF95@j#y@wFm0PrCO2j`-)y{IF0D}obf~-{M~qFW*NW(e z(ybR6qL&*F^zqk>2m1K8w{5@ZY{Y-$X~cizW;|^$mn(P_&BgfgJ?i^wqg5R*1I9Ung?+o?eP<-r=i-|BLue4V zEezTg25bw1h3GDNpX>U410ul|lxZ$%4T|z11IDTf-fUk#>ke8PI3+sCga2_sW!%8KF7KVU!9PCC=5 ztvXYE=DHD6hT~;-r}45&Eju$hbNJ5Ui|?YOPW*ED1@ya67QbWOU%&e7nb9TTWBQF3 z^VNy*Xv1(8;K#he6icSYyS)cft1Nbpy?#nn?~g~19LD1H*qgc6sk<(U%ux?)i;l_f z2O28$r)xY|y^@~B8c#%DMI$omCeai9VZVHIIG$+CmGkfi)jywzGNZzCDB;D?Ahh;^ zNRAg8wds>>pgUfK=TfX3`zAD(y$Q@WzVV`jpWMCJcyR|xcoAA@yzrxh7b{LSUNoSD z7r!aKcv*XK?Wx8uu0aXE;07#@5yKmPkk|`ufG{5Tg?WJRf*TqP5?*jagF(Uz^U&Z0 zKPc=M^RFuhYTBTv+uNfXCN>axdO_quuqN5IB$-MKUNyM1`o*93OdfXTo0wFJ60#nbc)1|>fz2D2!WN~UFK1e)XP)+JH~3^9m6O+1@T z^vhuQ)@FLU2IG0B&Xl(#+Twlv$+#4gP)#P=nTY2)oN@8hYF)MyH4lP^)?_;4{j`yA zByDhcJf7`ONOAGkWcu7-_-hmCG!3DFN!~QRzO`MMoHG`I+Dtm0qW&5nOb~+06aAZ0 ziL5ga&td}r_1{xB9Y42l+9Jk94xCor8t?DRr88MMx7EeLi#?hCj2xRhalC(JHkl!I z5_jq3WO^VDTfgJz_?r^hcAl~(viKXpM4bW%d0Le9iRY_NtXn6;!dIJMZ0JCd;HqRQ zm5leuK=_v=I&yqB5`D@cKLDP}kMQU4W~>hQzbGX7AWXd7iGe|7qxwW98kzkd{Yukj zzD7Uv^(6b+p;>d~?I@e%j#}Ouy(P~#GFjPtA3ioj-r4k^i@bBGJgh6Tu{YP2f*eV7 z3@YcbdQwqIx+%M?4dD44Ua;9^lJe_T{0QojnZrUWbN!GpFzS@XHQuN2s48Ee#toqf zdMNAhLfHgxxi)p>y}kqID0@VdXV*tRn8&v?S-FXtGKjL3by`~oZechW>%eXIv2`GS z2fekd19#CoV;yj()MPuU3f{Gx@Gf|pb2Pckbi*3ZGng;YOaLXks4J1~oYs)jcwRWG zJgeLU&6?!2M%fyWA1wG<`{KoAM7#u)Lx?20rmd07Ot030)1<1)p(_IxAp^0c7;9o< zBEhkcfE=MUx%9LZ03(~2KB5-~1M`w+4(02N1@yI>v7sMtlmT^gE;@0~re#+ydaq?# z{)1XEIPjnJpij#Wsce~+AF=wqo0iK((-P}TfBz<0wv9otPRii@K*u&IH>GHXmOUhbyKu* z&uH3ov^bs`F=$cE`l^YhMO3!XVlk`VJ6h}^W2~g6ZNxZRJ!5lp>Yfqf@y*d+ z&E==!WTl-NF^G|1oz;lZ$yzt$bW(lxEbQC5>7_+X_t3k&6VuSs>a^QJZzc+HA5Ing~>xXQdd`sf( z{Mu`kj;|07Hi2-iT5(?V$UUP<&w0_mSOmgfsS$%J@3CHM0^#qhbweP0pb3P3(MyX! z_&2>9xhPD2X?v$?~Koy zF>ChV9x1-}s<&Pc9SrIcRXrvjU6kJgjlSToCuC%7;jbnvQ7h9Rrg0rrKh8(}jZCjU zPwQzp7nik1+Wm9#rb6>7kFcNJkGZO6|D22Vu6@yvRrp zT9V0Svq@cd$TKss$XpXx5$|_c2>f;NwqzSeNdfN#TDvl^>Bungt;(eGw5v4+Ljc0x znAn(CBh;%diH_Q{3IF0H(H4ssynxy=Cj3S8 zpozgDZBIo58feVq+8D!X3qiqK!&Y}=-=*1T3jS(%n|UILC|RBE2b)Qj=3rxHb60m} zgY#(k7N?RX#$xPaYpip&Zu2089xa@|SlE5X)n+5X1nNywa(0(NZQ;zP&*CjLk zMO{p+3HcWdWRoQ_S!m;58}IIp=VgS^2pDXHWvC--Q^*S54U84^eX4&|BAex)(@bgbW)NaHZ5RkQ}dAIWyIf-{3s>2doAd~Y@? zJ^Y1e$}%h8=V#?V=pm>;2EtZ3sQmL2n4~cDrw4eQF>e_wZyht1IEXnP6j$lo^sNya z2zmS9+Qa~9wi?6P+F#_>z8~#xvKTT-c8nn)5UPz2B+0ZxNa&Vh^JIoS#n8QFs()ib zih_SF)*_LWlp{i*6+|R5vSGrVn4JO;7)B&dX7X-Pn8IWzlxayzHemjny(s52bn8 zPmg=KczUWWBgvWOhQQKT&NA?p$xUn+$(n{W<-h1jvk zjrR6EZtUn^q|ZCyQZ@vq5DcwAvX>z$nU|bZLc9`cFiaxWLPu8+Og`le0mW8F3kY}V z;9cS%pXzjrw!>J2@=Ng}=;m^kf(Olcf)@gH@k~~}R(Mx29K~2LZLQ!VREukcoeESZ z{+=miLCDCI)3j)n3aPx|<PM;KcQI}#xnJ)tlP?M=gpqF{;9X_@g>u6mzJM__fx)WPs89NMNf@!iC1O z!ez7MZIUBjRNL|nJmhO~Kz;}U$;6>X_ZS!#ne2x*)?p4VWhVL8 z@0A;S03fYMm2YGIuyGYmpFm4jQZbtbcqyRM&oQFL+2rlz<%kdke@im$u&!ZMa&b+E zX#3!#?Z5->wj^jEWExw(;5v_rdl)e*xhpAwRf%+)M3O-ND|nW=2-6UGfI1u&MueNV zDJcvFJQ$XBh1b8`!->jQL!@kW9FS($i2)d;jj?TCRFn)3g2BH!*(;$Z$mytWX-2Ho zphG)b-yna*6Wp~Q#&$*g`4UEGhd$_^OZ0E=8eKjRzQVZ`T$5~f8i=9L5nBzC|Kc7% z;d5T%;g(CUlZ$xoBG;+8CbvEhJG)f(cAr{4laa-$IJ??d0&8jik(DkJLm^@SK{#D< z2dmf23D!-1{v6>h|02kBxN-I?IycVfkN0&WO4MqH8%k6cV8J9~Z0ieE)f?i;RJ<*f zsA`R?Z{HOClEnyc!$X`A;3j&|J%?MVZ1Ei4!s_?VbC};t&*6Q9gUwQMu|~s@B^nk} zJXolB$k2l0Zrd8Y!$PVjsb#}nvV)OAb0_YkvV~MXVD)=Ps=alx?jjs)q`E|%a7%RT zp7~r-w?yBtP~{D3#GuOWS+6y!yhUXTRsO{4_l_!i%Rql89BfqioW?+*&ffl69h4VN zI{Oa5FAW~D+u2gLMYmfh6#gnlp?&E=pJMw_*+QX_tbXq(w6}aSj&QJ1==16aw@1hC z8Q;|35xvbql^N8C!8fy5uQk4zO=SyJ=Cb;|p^EbB7L1$C066Tdm9UL9XL9|C>^a;L zXm!A!KnU1KvPFI3&S-ScNK$cE^hXwwv{EAmNls>6)=09N$`+EWW%YYU5<0fRlSGZ* z9gWR46d5ZGI;ce(4Zgr8j=J^2@-cfDa{K<>(W@+!*hEbkl-SIAs!`%>DqARVF00== zN@!a{i%wB~7DDFZRn^O!IOFrwx{WGdWK=1XbIDpP>yRNrv>5G}7O-uXCpX?3z12dT z8>mHtI5)A*YsA?~Weag`VfA}QoMsD~Fugi_RufA^vm}#oBF}x)zKuKub>#igaeFp# z&b&YR4GTq{phgUee3$iDqsaHDY@x{aS^eHoM9eNrI}_Q?c-rK}w@#vGsdXDYF4fp& zL|eSQJJ)wcI-bj(u`ZMCk*z}GT|)b3JrKRZLZjbN!v>AsU^LKZ^m{5>X!I7V-#Z%F zgiF(d(TPXOgv)!>qKz(>saqe6RxPnkoqdyOOsw`qq61r0EY)ldSzp@b()PeZ(J3Aa zSwdgq$WlQM`rHz&yZEpi+s86O;00%)Bf}(l+N|^B7qD59l)M(vSVW=+kxlt>^YTbXXiu*l`47; z^d%6$(Kob17q+MtGC~BG(|*zv1F|}q&`r1&51FgmxO_=Aflz3~ut~R3F+Kw1mGE5f$PiKh%G31F~m1nza3mxpsNvl)pB) zi3M;N+$=B%+i0x!BwutN8aO%t2`s`%InJ&pqzOWHdF!lo*Lf7}KrXn1^fhic#P3zH!(A49r&-A+;6m3X27^BM@;v3_g2s7h(&ftIE z!{=WCyIW}v7Rr{`$3~A8cOVC*QQGmCPy=ten%rjP{>#kVuQUG@>dTMp>G_xe(a0L%ghRkUGVP?1y-)4iR@y7oN4^!EvS%7u?Ay_A zSgP zIi|ez>?h*MRFfOvqk4>Kzy(EV9tYvqo?645fi+2N_eJpd#smwm6+#Fs$z;Lt&H}~1 zCZWg;=x}XjlCwAn)(JF$4Onsc&?nHB)So;SU9e~DGXL@Dk1Xu6ni?_KWi37E>~bn= z-H6mbU1ygw>8*ub+UVWhv5VjFXuE$^R}x2EA-o;@h=G=PPmWWpEQ=A2&}C9Mx9hk9 z1eV9s+K+^T9&FBM+7j7*mtpr+XS=2TTyK+GxUwGTEW{v>k^%~J;agE$SXzU z7XxTaEMk$Eg0;CAN91kINRPCaFU_Pn`uY=YBOG8y;|WzE9u+GR$-a$P*EoI?Zz25H z)G+D`;e-gCwp5PSzO14y&4i6r)Lo@1%7+pXvu4bk-B}e0gMN^mi8FSN1R` z{Jiv_3qL=Vi)&@@VQsA}r?(d27p8Z6C;WW01Cxyc7l1)&8H9^fc{LW(-fTji*n>g@ zI$h>Rpg!kxRw}qWU);J1IMWDEgMdmU<-MsxHU^iuv9gCToSm0w|LAK>Y;Y6emW80; zX}0fX1BedQG!UaNR+fx07T{U3f=&|-GM-eA3yz2`+>1Ti_UnF#6up*B+gfVUSc^}k z2b~>Hr*bHTX^>;-ZqH=6Z9ops9vls5qo>mLu7iD;?CXR1WF3_`^NHm+PM)Q$x({7C zhffz_vs@72hsdNd9)-y3uq{W$9r?aZ*<5#D9Hx=}Zkg;eI1GpOx|97jSjOfar?=9b zNvbwRR-hZ|GbaCOPZ^gPqlKk@ic`jA#w7J5E8(-WIaAd(jd6eumy!1otj_w|9a& zP#w=^xZEHXm9wHHo7{j@N%8y!7j_I*w?V0hlOiRgeXto|GM%D*Keyot5fmhdu~`F) z)6CoKY{Y>*nD{t1mSw%ugPXMx@g3Xhy+^@;GSUz&TM_jm(m#psymYX15)AT zUz+P>qRj@w1R|TS`!c!BahK@dV0BMhHq$5F2l`teNL)6w{^}kFRVt6%gb1Q^fibKaQOvMeWRdi* zMBIX>#m@1*Wdm*#I8aS4EzKJsvKmyNKAS+QNVh;*0)!t*M|*j^1Id{jom%7@aE211 z)q@h%IdgwdM_Pu=u$KN@SE^^nx53_=fRpea^=qRH6#N}7$Y@=X0`N2zl^#jP9Txzh zMqKfMbPn7^5}i5sfhHwiheb*8uBFtC^62}kuh^$-ANt1@ADB8)(t)Of!fw(jR81gY z3^7v`8&_jm58*tEw>nMoODNy`C+A#!YG9Hi?(^heSl7QHC>fwCmwdrORakWT`{UUT z7X|?$$h(ltko*z$Cb%M#g^Y6n1A3HHQVS;KCnwU~xQD_j@y!As^AG7ykoQtX*5OOL zGPtxL*%9B6$+&e3=G}Fc0WOVis5Or4sC_15mm)a*1|T_Uh6CgaO(!{E2FO(-e6<6?WZ3Wufy#nsCIzhuO8^mNmWt*;w)~9c81`q z9?-;zFlH&#fYIrW@4VR=h^K~@KaHXVUo#>_@MzX&gUv2p30?ogsqPXeY~QrQEM{2V z#*E7{J+u_UCd%e5>Q@+9i2HY5km*S#lp;o!PF2)oz`B`@ecdkN0f*JG=y4i@1yFFu zHTlg0xva}@2Wk>sY56|0qAt+^1-U=%W?2%!A;Yb)0FTG8Y0lCiT z1QzYMyN@&V6kynds3!f(EM!8>@tz&`XXVL_G;07nZCG~XGkYjb4T6!)r`TF59B|;$ zkMvKydCUU-@X!-r%hY2I27TrsnFShm$Q)l)@SmJXcFHBS z+SC*qbBr!VJhju&o|&_dP%E8=t4oRqgOC@~zX@8cOBei$k`B8Km}z^Gu59gJluu!# zi-S<`EVg$f(40hiJeLVqz9^MI&N~Nth_59nFiGaMeC38fbd0?i(zODiMUYkfX%=1~ zq-_YgVLNhvX+z8O5uqnDw1{zt)7B%NMW{C0k2|nhpcW&+mKYH_`2s{oD#1x;k#xll zn3e$%fI^6jtciXtD1}KYVRs}TbbYukfgSijJI)(AP8$gjZQeB`Q<}BJffzObyguS< zMusG31VBvKh~>K#L?I_!k>G)_{##~l2vHuS7rE#XJQ$*Y1Y@kijfY5J{2VyklF+ZKXnv9Ri6wZ4!P7;v$5;%>$%JVW6ard_UlrZlYkUWoTRF z*!jRj*5QT?L0M096tXTX5L%Wl2aW-id%NUXMQB-XFRmhSHhRIARLdy?JY~s|A6S}9 zwPjoc9C~7pK`N@!{#rd>vqMjW5K2)@!CKmbWd%ATzi!k1Q>%V4SKCbzT|!`a2G0T zCXwo*ku(cN|8m^&DIZ`qI@knRZ-~&h!7PvmA#idvjR6Q!POxcp)QoU|y{?x`_-kQ- zu`l9zJw;QfrM)W~hnv88kjod7i4of@=>P?hw7fs#=H>S-?$);R`hEcN3wc-Vv|W^I z%jqixRyd(zfpoNSYiwpwAo*JoosJs`qJMC43K>%6KHE?oB8{@H%X&HT;%<(&!Qm|HSUpZ51I zGQhvsl#09Hsh$paW?=+cxM+t0-kf}Whso22I^%%CCIlu_PzC;SKXoXN`ycZqoWYECjNt}{O^jV#CB~vXD zPWJZhxZAlOAqFNRR)$E?r+{%RmTtOCUA0J&8d#c#34^odsounZUUEzE;+ zF4x&u7a!h>Xos}5UH+A^Q=pzq1gyVN24ui5p~ig`)?uZMPj)kiZ}VxolK|4OsYHy>xy+$xwuHX=HiJCm1ND-3#KtWxyoS zm62mIggqz;Vnb0P{;wdd1*@56$1}5ue@POjAb_$1O6V5Cdy3sEXb}PyI5_Msa{8G) zp=Ll9jtw+qdV87Gcj&lSKw$D%ykde#7)y67AhZB3CvrHQ5=I}|GdnLT(NpM&1=GqG zoec;x>D0>5r#8_bA}jUV=jv<|Nf>f_ibPa1eu1cOyZRKU9%MG z-%Bj7xNI6?$u-oZqCVkf2@0)%&V}2AtP77G0aJ>Eu6N=V+@ReD^1F%2P(8vslI!Hb zTYhMxrqH;&z_m@A*F=w0a~p!L1OOSSvyDs*MQaq~pHTVq*CU&cAtY_%>!*FLOVmGT z(U(&)ZBmv+%y+!aP5-3a_!OHW)dyT}fuywda)p8{FD;jwiyCl+9pO$aEpq zfQ!=St_8lUzCJS`KP?0I7DL$yLIXsR)NhXf)HJRZ%Sw0ky0GvgP*pD7gN+DO>1%hBC4eDC zB+!tJ=LA6r<}j5YK=gMd$;c|4z{8hjdgBNWq~kx>bUVv7^xfH`Q@*S-osahqBxEaj z(Mk?TN{MwWlaS0o-N+c?iVZ@Zb8V&RRG{D{gl$?RT`eq$v`RXTIm{z+*|ap-< zXdooYQeY*v7Op|^4H>zi0QUeQZgHcVQk^dZ8z>w|&U|#2eVlTCkQvN@D1inX^O2j| zLM>eg0?1h{C{R|-Guz*g>yf)y0VSKHyerPQ_!j*CnUH0E?YUTpLCxLvITYI07ULTF`*A~Ko2WtPf_kx1?Mnw-m6 z|B<*PCl3>W55;%zJ3bRAiVH7+gS+9Pb+@NB_m=S@&k_|1}N&xdshtEpVD!Wi7g zEC>$l<_5#t!mNo-VoU>}*!aHS#p;Z%>AJbUE|b{cvhE>uS64#VA&vgpi7eBA#?#2V!)CmhPImD;G04m0icWfrL`YgzGGYez>aOjX<>yzZl4@w z*OLNbmb}1nNmb?` zU}SysHN@SvpuXivUe&RfQGCdm?ZS7!0O=%T7`U2X@p+bWg<5^?ftK9nPPZgufB|!$ z3T&FDH4Z5CPQ3hy02(hm+Y>5%QHy_n`bpF z)oBD&xMAYO{zHa`Z>fuo1U7H%Up9AjaB&b;axCpawn`Ur1eWKJ8?NLU!(p*F0#t#Q zS@4&&n#Tq7H*e!9L)?J;=zQ_gN~T2!H?4p$^sq#SZ9L!%B=*L+>|2rGqqX!X8zN+h zCd}s-f{hJsDL0-Ct$-9kWUF+Q=1#3Th!7JPt^XWCFa;D`&#bf6&Jr05b9L9S#I94myAlMsB)bHv|-lOj@bj6kln-VE+S!h(RL zHWtdzv@)-&IOK@K>CE0LB(D#Bq5*WE3=qnit%tM@6zxDfQDkBbAT;BV5Ck~6wSzXX~vx~28PX4b;oLJBiCR5Cev_RDY+Wj*TZ?sd(4;cr4HaX(E} z;V63{d7Tjno(Nvt%ngub_UKo_I}A+T2fu*SNbPN8X&M~92L@@%=q&4l{P?1FdXYj2 z5y>UH@fx+T5h8_*TP}JW@F%xG5siXxaX-^3&8!-1&LKLUdHl6biBrJ?TEZb92`5zB zaL&PvOOP&*!^ko;h^nT`+HkhOd5A$AEe2A^BK~G1j&+enuut}O#dq9WA`XWNQ>(G- zhW&KMU0D~4L7=&7pubD%r}814fXgHf>Q24lI9b&y!8tp7Y=XBrNot`klSGH9kjFNc z&9GyM4$SLoPUgF5)e_Pc>0)rIdZU}20coq4r9}v%0y$QD;t9DzTvDsG^j9)u{B%Ia z?PylBev({>N}vVCdbaW8DPj#J_cv#>co!iI_`No`9uRYc5Nv@tGsSGW3=K(AMWCTY zKM2tXG$v|r)CIKSyJB6(n=^elxF9vaX!nIXnzeuh)+O1HAV3v?*)CXkn?*iZZ3{q} z^8&YJFVLLp6P^D3pL^Bgr^KR8ZdoJGnvjO>#|@o7*H0}wm<1faM@lEKlItxBdZsvG z4+5=;L|5L;G45{`9-h{Ha&(1>i^%e~#JiGhGA76mC4G~R?sh3cKpyhxcrY(a$llQv z$4Of^`+N(fLgcJzmi>dj8q>xF20WTl5gx+%`xMegx>&?ZAXq4*r7JC6se~TS>o7k{ zFi1Bxg9xrl_6;HK4jsumT$U7Cnnk)VO^tRy#CqkDFtfV--T-ZRE-C*IAdzztk!UU|XRsyF zn{=>qgQq$Yo!Um)LH;8*r)H56OYgdPPlucErn@Me5)`iO%Dagw#IsRGCZ;np z;T9taWbBv1694IIi5G@P(GZpHJRs9#w-#C_L}e&z38t=V~0FbF+hiCdo~@ zWky2;sl-4d(v{4D*wr_R^$-4Tsqv+v3EMJw^jyzmQtlDt)s-$x<*k_1X}Fs#n=YGM z1o#Zd!Guogjd$P&&VE|)exQOuM#d;Uq)2oWOZ<5dsgo#r-~<6d2@u%467zd=qJsr2*u+|C&iGq! z@|X*hC~tyszn!!c#;lrKQlEhcJHcsYmq{3E$z-OHy$Rn7v2s5Q8EolA93vB>;;@83 zD@8iG?54|cu?~a*gp&*t902c!gVl*+U_UKyfg2%-O#*F2ggpZ5;kIr!d=NB{SSKe7 z?yMG^SkGIGry2~+z|D-FW_f~KDjH)tj5y(;4X>~>SXXhnMcbslelHt;p?4Ex!O;%N zXeT3!d!}COQUF63kH@^!4`0QCL=w;ZOQ*3u^)I_*+L;?pg?0XZ9h|FQF)yJuo$mCO^tHee*=be zB4`2Q4zs}V<2A+onWX>zRw&ZaJG}KPa2#64_x*iR%UF3G`B8c@NS)}MlpkC4X`r^w zt2R|#mg%>Y*sir}$K6sIA{k*{zAb|jd@e>9oPJIq)fF>u3R$9p(uYD@Pr99J4$wJj zUeH?h-#$78>@w$CG128`Cx(zTCDfY4%ucf1mU#UMt-v}sfM|SYRD7)%MA_g3uvu`u zpdg3~>w7Zm64a28vN*erylIjfm5j(me`~Uz^+pIH*ot_W97`x_=h)IwM69`O4&j|) zWZ%J@O}igQ47OtJ)>0}6@i95yAb+hQk*y4&R*FHuNiHd2aGx;lwQwGwKr38T4xaM{ z?0;MW2>h)$s6)$uurJutB)8js}Avlw9!5bIEZRt&51@;bcms#UK8HRL2*RtCm;5$Ni zqMNDyWJLd(&X@2WVu~n-4Z_We0Yq@RuLPywI|;WtU@Xnl4K!tOWl3?x5c6yBWMbED zr;S_(_9gnsvC>f6{pY}HgZ=VGW0M?$(rbrecfzwvo#p#DjI5b?;JM;nzlS~}BDg=Sm@Za*@=H+Hy_tAYY&e9C=!*Bs>_pBGY zq0c_GcTp_jq|(7(AP*`IHsai_>k+6Rxg6FfrOK5rKZMi@h=DfUFK3zZxP$?xIGyR> zUz1RTs+p$J!*QQ+g+RgDvauY3YbCG-Cl2}1WP@2UDZ{?)!cM+5NtQX%I*@D)c8fir z;DR-jQNg_FHL)}`#GE=4)5A1!Ar{^`HxngplEJ>a7(!tSNf{D^biaJF;l&{m zXZ(X}>DFrPth2VHS~$wrl5o4>Hn6rUml;S%`N3T{1lA@MCzJXRbYnk(DB)1>)Z4e( zE?`(l1mI-lo?90P6v^jXE6W>=O0%4^+}Yl)yuz-3jw71@0@nMWU#_PssIS2Smuo8w zpEYCFOdRj%@4}Js_3=U5_3geM7{~^dhw<}*Ufzzfa(8KgyiYx&F82L-KcLq?qSwtZ zxP7pbyN&u^^q}9{{%bEa_t zctc8=<`(L?&=Tuu8AN3`ks@FnLlwl%wW(L$7@E3=*NB%l#2&T`+#G7kxJG;)J?H~B zpW$E(+`{i`19vRFwG78j z-430>2hQ@x8!EodY!PhjOZK|PpHM1${(>9>5hHE8za=w}a?#4=$DxY)^w3U2VVyJ< zIf2K1C<<(Vu56GyCdyZ!uW-w{T0hq0bIAB2)f&u_QQ*48b*q#XTDHm*lUqpl1+bXA z-bxo(k=SThm`RloZt;UP$uz7z#dX{2i1w|~qKGY+m=5ycgnC<>oLQkNIyalGDn!oyC1CQ@e!|4MdM`l5d#2Yut7T-g;yzWW&|(|J;0i3!xH6JWGi}>vHYY$l#wmbIaZ@>*&Nl^dCs==foI(hl2%2|3 zuTh6<)?lh~J1K~8eq*60jn~{AKPHlBNnlKBBz1QT+Z;mFC}I>GQYba9L^*Cks)G>!#>{o5T2&GmjXqnvtq5oL9^d zlFf&t5vFL#^Dk?6y~_(Hub~A=Q5lLt$1J|wO$)a9N6eq2Okkv+I8esXK1w^`RdS}& z6Xpk3=Ms7H4$B6~)m`d`AxO^n-sNrv5I@sP_lls<_qbEcRxX1dETP?uVUW_YV?0o4 zVB0n8mhsElmhr^vLNoVp%XrAz*prqm;{iY9wv6NHLEkb~QMtI+Kj?>S%UE`>I`fp+ z==`Bnxp%un+bxdVz8p6LI$kANIxB%(Ra%CApXn^_VA z7T#56F}AU??zQUr(_+(SKS&lPExG8|P&ABE&IYRIPm4WnVcK)35yQ@rrw5&B&u6V0 zc8&{mroC8W+Rstt-Z3p@hmwZmxmcMsYa((>%M(ckS~o&CxOfZ$b&15rjMQ5X6<54f z&HiQX^uo6z0#wk-|(WHf3Tf|J4;!LJ=b9s0Jp(@gOROG_S z3?`hG67b^Qrj17$Cw5!6;qSHtXapB`Ail^g86;>SU*}1@uq%!XK`yXTlu}2?4p^5G z{yKp1(QP)eH8E6^&9sYlk|rUAZEbZbu}KERivwz?bHV}sY}uX#XSX#%$-mFzmSH%> zoECTFN6jVNZ_}XEKn_Wm0??mfr?7U$vaR;l!H*#|xC9!yQt`NvazbMF!JV%Z+aaSU z&VcmwmvkmcfL5Rh*I&q^Ly%Bu2siTtzyTRPNs(E_hGdGTUDR5COFMG0$@&8Rb7c>w z(1lcwb*+J1T zLcIdlR^V8Y3yctVkc*($14KeKutm8aa|vqKqD!sNba|QUQI1ek-X$q3b`nFm)(u80 zF0pGN#3q||*xtf)EjgBP%8&@6yq!>4m_1QQ<?+nO``tIZmIdp7fWssEqSwtbhO!Dtsg8G(qpAbRRRd~u-KhGOi7%H^tCQFtIj}1%Vu+O>*&!TQ@J0%EsEJ8vgh_@F zYL`Z^f-H1o9Ws)-W~DRuLWYTZP!zNGi{nnQx@s4oI8g$dCbOM=#dIKJVaY|%;+&`f zFOJidNk~$vW$$x9gwnUlSMh2L7mN84Ltxgz!Y2{eNJ)dFgMo~72sRYtATXahrMV@& zH7$sXlGY(g+KHT$+zv)cBiSmmZktCW&_MaPiaTP7S_89iibZ1!xf~iR0`8Dz)fp9J zUSuayg)opm5BW!vZfXtgGmuYoWb$=A{N$WICzM}BaY3%SjTins+2972w?i}D=u8WL zEmQ@;u21t;6av*fnGP0xEoM_@=5{)1U9r4<5l)Qb>KT`VX})E+n7z~$%eJb#C5QB< zu&6jrT6n)!Wt8@0LY|Wnfk?;)&h9@ldStaVTtyrDQMS;E>(m$0u_N}(*f1v(d%-%f;pX_2rPn>!k*sw7zVo> z06JA^_{?*ANRFd54z>gTc#6BEovJZtW{pis zxY~zmladLIz5P2Yoy| zi}m%FMTY3n10n5VUbl>r-et@oCF~GPteGUc%E4d?>rCFsrc%*rxUU}QcD-itK7IO# zv3qFo4)@2d%bTOy=Kche(J&q!O%M8fUO?qxU73x&xi%7mxVBI^hTaW*=9-yZb1l}b z&2@GZd{FhGgkv2C+- z$6St^m*YKct|u%VBiV2d0H424Vj~iG3xzkF3VnvwX|kRTgFa0vit(T~s)bFlS$nqN z{dH4pr-ch{p;inoxQ!lkF1Uj=Z*aj~8W-F{FD+bfAHCZAEgk&)DF*Zfam>YH|Fo{e4TOem zCBB&vDrZ`>Co5mUTV*5t$=-)Qn{7+75q7mN(-oTK9x38=>jVKnMxN@sp|p!xrj=G55fGbTujABP*g6V z(yyuXIVyR-ia(#H(qt-ap;9B2zCfj3Dt(bkgH$R|>6=u#lu9pC=`t$q^ELdroJs>! zx`IltQ0Yo4mF2&Vzh9za6P2!_(m$y*NTp^RbWpCQ(&JR36AsEdIBlR%oV+p(@em4y zKPY3c!;q`q`fY}|y4&|o3B2p4W+FFSbE zAWm1DGl=VgQi*kF@FI20#jz^Po0_$Vqd9zK>zjE9#}iSh7>RAM}Q3Y8cSC#b}DI721I!vj=eJbWQad5wuLrI(C} zRVpzi-bN+H!}n5&@$kb`Vm$mfl^75Ih)RrypQjSz;a8}{c=$~!F&=)GN{oj;q!Qy{ ze_lb6G4Xy>VoW@iN{opoQHk;JG%7J3o<}9d!_`z`JlsGf#=|F3iSh6mRAM}wq!Q!d zJ}NODK8H%+;XJxSMaINeQi(C~^;7~A@1xvKCE{V_0hDs54fYcwo(m38h7qf*K3nI_ z3-D@k{mE~5Xg%RMlsY>xL~PVGzH7X434W(D+?J=%K7L&&y}FzLUs-w{q-P6{c3mrB zP%{Sx5z>1LK1Tz0;J>(Xm-b+SMpE*#V|nF1h62au`K_Rz#_0D@i81=usl*ulF)A@e z{~?taqd!L_#^^6oi81;cRAP+&4wV?A|BFhD(S7Hj#Dw*5DltB#4*Ty_HIg(NCumWArX6F-Ff)i81=wRAP*N36&V1UqL0t=hvZ> z*BJdaddV35D^y~P{!J<|M*l9A7^6Q$CC2D4P>C`6t5jl){uY%OqrXQb#_0c{5);;; zb5UY^K8i|=&kx8`kumzARAP)i10@rqhl$Z|zTU*>i_WarEk>8{IkMm~8LfQUjE)|N z_&opV$aXuUzX|ebeEu$#7@vQ5J;@)&=>8k*zRAP)imP(A#CsBzp`ZOvrMxRF| z#^}{lVvOEECC2C{QHe478B}5dJxL|T=Y3RSe4amtij2`eMF`t?*|jD9C`64k|H5f0{~+(SJcD#^|q6i3#*SQi<{T`&7D$&A*SS$QZr+MwA$% zkERl1^aH8H82vCRF-D(BCC2Crsl*t4F_jpjucQ)V^fgpsjNVQq#^*g$Vtl?4CH3)3 zW5VLvPmP}S?aLMVp9X`?B@mQahJu8n)*5_DFIZLb}CRTD5y>$VU5!qnHGl4X$IhGcj`73E%HL2d zgwlzPxUvE_*eLJQ7a!w~9pW>W#}~5#FeJM7#{s>X0PR(Ug zRR*^dkW??zpJ?xg?NjHf$%-~0rtwfGUKOjF!>ACmuoMUtoLXg&O0l)6zI%RUWuAQs zMr8ynAl4y)-6bA(i%NS=q==YziAu*bI8td2YIPRgE4tWPVRT8E=%UnE8S8A){+M83 zGG+tqfvhBBH8RHHNv8H*AFFAxO(vS@ES9qh;hbPK;SbnVK;)bi{b0NQhFD{nK&+gh z#x@FGzjb45o8a{gH^u(E+q3tZH^=55ceZE<>9jnXJHpzNnj07B(?FeiU99qy&l7E# zoTJjkgQ97GUr=Qqc3y1Ys_b*YGnGNzQevCJ)9T1KVh5_c01@r&DTaEeuk$gBGx?`yGe{NZ6{;zF0LcE+h44Qe&F#qT15eHRj z>bBTS!9acD<2UA~peVmW0gIy2r3h>PE5>gM@D_83x(DN@_yNu|9lxT0E^O6j`-Iuh zF>R|ps~OsKJpP{8*xjCBAKn|AI`SCn0IAR18LP_MUemZqN2Z3*XETdRV!GiMRB6nt zhKSSRw-UN|x550#$(RxIz5|8Kysz##f!2r#WPAPD5(w;5B#_c%<*w_B+{JdLtv?ZP z7poC-mxXrK_s42Z5~+;))=aM|W{cgUJKYuIJ*e4ut|W1WcU zkI=0R{}m&5o8!m{YyY=L?$J+q$4?^tvSv@V30(4&B{QJ6-`gfi2555|FD`*kAF_t9$S1&`7>Z# zt1dT0{!w4v9&6LsW{or3*jnM(=1DW#w29c}v!KLbp9v*wt#I68P(m3GR7~RLhed8- zhISu0^R3c`)qHx~vhb1E(Z?*=8yY?ENbCfSTjn`)i>(!oTfSuGmUa=hd=`}G_)I8a zYlY($gA&RqR&MDiatrhQeE)s{x3HQab4z|V?bujZ_G?V_RK@$M;#CUS#F-{};hqvL zsQFB0sn6aPjj8`RH#%-XN&lMV%uTjNIc`dsxv4|MP5&#D+V;OeDO;l)w;7aDzHH^T zuNS$Ec?KU9tz@ib$lSKuno$1n*dfO}v8N;hZ>g_79;?x0(|l)cvbDl-)A?p@N{G1W zv!KMUKNCvWTH(0GpoH>sE4N%x+{H3a>ye^eEUWqSxFz(x*nuNIveHPM`#pqp*J1SIJ+%4MIKF&YT1Pj2~W-eMM;-b%jLi;~!(F`>T*NkZj^Oem$%%4^x8 zBx4aJ5%NP?b#hkYz-#5D2#-5|7)$Rqv(^46wlLo(p`qqQ71l~=co_A9F&mYn%tn<4 zv+b6tZJc_=Q?an-PTc~$#Vx4aKOUkxrQyXQBiGf`NW7_Tqfi_K%+htW@pKhSYo>nv zhq3scjmUM5BOwAF)WnJLs8|{8N+HMGiG0tp+Y_5>DWO6 zch}FJi5;=q!#4fd*qr=M$*|2}Qd?Ln9kwylIX01~BxZB`f-3iDBJF>~%=u)Bo0WL=)gCcxhG=1M8Y9yXVLt7T^e9zz8C+SVrk;*PKGE@d+i|>v zwf{}y<=-V^40ZR%OZlyJwp~-yg;+f2UqrePs~OsCdv90l^SeE*?tLycc~K1gWHF8f zZ)?051*r!sdJ;W4`%aXS1L2l$e<6BM-k2y26KvWIZ=}}4{(e~EBIVl`%_h#Jv_ z7BAvOKaX|nHbr7D#7@dDr_NjL&c$d%VXbuTrl}W9)>26-YpK#;q1{qi4pe9SB37~PC5UtphrIKtyDO9=pQ})S< z$^+DAejTgO_8E^!IYX`b-4OMn<;zQ9n~gP%M~HXKLmq4>Mvpfovh4#^jY+lZrP#Lw zBm9XC{A0T&NS^$FYaR4>5_Np^s9B&vl{WNw+3Cl z`bKQdZV%9u*JCH-z0aX#!4f))ZC8Y~(gE@o=OdM*^N}ide?B6N`cQ2ujs?nM;A-#h zhd31*Za0=ne`DR^S+hht6bJg7H5g39X-Jr@T{~Z;{^gC>VS6?hQyd3FSo^;{7$?!-}?fFrahl|$yW8RIOWdFH3VNB&W1h12S#d$CrPU^#d zjol-7-TOBYhTFIPEjCAl${X(q>5zV32<4{tV;72`JTO*B`@cP*UM0}*~`(dm@@avBctQ5u7Kl@>9k>JovRyJir&rZv=<2trRW0$J9Bvm9&(A|D7uc~-kMmak zLGb!$U*!#g*RT63A3WYRMX-cKTBlbvA+29OE-LOz0+aq6-9;h7Wc!JeD}OBL;`RrE zl?w##)qjO5n+3R>R$h7dZm(1imsck8?@CrGO^p}UN>{4qNT@S4o=W1C9lxN;-B%l+ z#_N%}!RO_`i<|CQGv*>PckaxgN9Hd7%{W7q2(&aeiSoPWtYYAw@l3%olwpgr2vc)oySUr4tWj616VgPsek(yQ2OjZW{1WY-WQGUi_ zv_d+|Gh4*aqeuNLJ%&2r$jVdZ|DKp&Pn;&|+cPT{Xs5Af+A(8U4`$9M#PlQCp|YYH zu&QYV9IGOe!jS2R&MV07uFjZV+5O){&WfLX*5p(^uyX7PMVkQ2+ww1Rl3Q&8tj5N$ zqaik|*+$fF&8j@9S)`CxG#lH3-Q+)w{eC^C^7Q;+L@x_hLk4M_qX=uIGH6&)Po|Po zPo_#^Qtg(9YJc^kxs}5-8{4tKTik+r-|WhvWhrR*OKC>5ESJm(@jwa22{U3jU1?QC z_e!cioK;!(sSVnRj)Nww{jV6b&QEgCl*NE%S}6`Fx_vj^3$A{M{#fT7tFg^HZ8(pz zPK15y|8HBy7Khh-ns5SzjAgyFCN@kn!#IGD;?bL z6A9SPl}h6If?rVO?$1cf{_U3|vNo4IwX z6~S+IQ+PAF;54rP6~lU;A|0{;|yPx*3DmS79(>$JA()d`ezZ1VMn8>W=|5bPG z!BJIbyvr-ugoJ=3d=exZ12nvr1TjPjkdTlU*@Tx3NfRW5&EqyEBy121aYNNcV9ARO zhnAsZsWYu(k*d=H$5B*TrIi^cjwooQ(y8qXwif$W!J+MU&dJ_A_ujLc&2HS_Kj+?a zfA`++oZtD*feGMq_Df8f6KWr0`4twL4I!0EA2Ps>=I z?yV<*?98*7yn87sDhg$l#Zw}~@NV%KQ~b6OUpdCaD{C+Tn!50;`epVrIx;%`uB0k3 zgcV8BFm@nGPYKk>RMiIaC6`y&K(>=RC<{|>+F-S_gYs+qb{qQtMvD4u1W!>aSocKF ze03fQ1%v9I63O!$VCa{KBbaWv1oDs40KeUA3%q?%WiIX|l>Xe~Ez4nw5}UOO>OTd& zQCZ0G5|Aw^o5EDAr~DSH6ul#g_57$7t8yG)r>VSuQa|sQ&0X6|lFG{vd3%Z;$A_Kv zn_S4fBUzXiB#)x6n{1e08~V>|BwGw8P?NZ1a5K!${#aJR0h*&gP2I?U5Z}28hAa(i zG^9`5DXtjAk)Kz=Y(91PtY-=&W+>}4=aPXPa_xuPV5lwQRw_q@TQ9Y@!%M9}oP=8C zUwTdco2tpoW^}7eleyV+AJ&M6QRHg0dTX;gYAiWL-a!ksc(3g&f>2^cbj=o6ke&RY zr1J?}hm)BO=RJa{T-=22uc+rV@0h&u!LGA15>AImv5-r%~!R}(2Z$6mt%s} z=UJ7oBC8<;(n(Pjq;SIuGyN;CL5k2`?DZ&bG@8BUxs~g0C}c{xEM8|^+X{)yLt!D) zAVor^(xF$^Uvyx@^)k-*a-;iKU&b^2=P)zJAA-7;;1U2jXh+wk-+;QFsD||H(;+a7 zLx@b0d~N)_ByB4sa=p#;ueSzqq;soh2+^bJt(tAQ-iE>|>aF^AmDJJHTgK}9;bqkz zPQt44JG}zOQjZW9r~>bO9)sz!zuOq%OM$gr6*_BfM2FGdCBy~PeM5lz7^c=hde&DV zutpxQflRJ~OZ=;#L5k2hd2D5%?f%9ca^(w!5kG`X(IuKNMy7m(1*4;RVt|9Az!@<} zkuai6!lY=rYf#l^A61`Bnffp!K=p~pYjHUCI8^;vDE|F9+&-AaJ0LmB8VXwncR&Hx zpH;e|z-CVL{a}zHVaqiUGb7iZP$==kNF3dTA!-U?l$3;p5`z>8B}%Sdf!;_zo~~(u0v~erJ>UygA{?V zZhVwFU~x^v{K$1E6juC@au=OW(h&eMKAk~iLH=@5*$Q*`XFJ`XF1CUAUX3@ASG{w+B_;VjmQsS!mFcp-7f9H3<%@9Z zicH{`>q_+H(bN^?ioV79Ol@(Pc(4D|~uAR{EU2%&zK%I!J~ zIlGfHs}ftFE_L2%%odI@vq}agKr@cNK|Lk1&%ix&9KY+=80hCQ7{WMzc=!b{b6LBi zsj=CZI zZtTO|`X^-ss#bKQsLut_=V%Vuu!o+>M<#d&1wF%_5$#dFhrHgF5NY8e&SJvlB1q}u zfi859^ER%V%QyQN`~+1-aTj&9l>}c#7yXuz6umRb=r5yMM#>xbI!!ZuQ5|$J53rxg zzLpsxuVI$qmPg1WCnS}!&w>O)vl;9p{*wK0ch*f=kp?=6b-@|nw4Ol ze8Og68dH^5lkpx8_I zU(%tsJtzeuDz)#;#<*HA#X9f_5j?B|&x+t%9CYq?E4;wJ7SV~{*9u*QaaCxc4*YMG z07h18pRC$0fOQxVIdX0n-+MrZHq{D`?#77t=xcTXd

Uc({&&WW!Ui#L07)RWID0 zj}h^9Q-c6TV?+d(i(rlpJSKuCb>M65_*&=Wp$hg6#>DqVJc%fE4NnVsDn>-`qizAb zh7l2L?i0X07!kof5j@_ft6-Oo36K66BjTfzejc8lAP;;1aqDmz1js9QzZFc7tA)7s8{Db z4W;H>HQMns%rF~%%bZ6wy6=zRRgZly>AnJKi?x@$MX)@ZPZnzmZc zQ~4Ssrm3;M!C~KB)8xSmHMnUy86@U-(s=-)O_Pc10BqiT^$POvHVxjV!Fx3LD-HfY zgG)5HOoQ_@xK4wMH26CWKB2)i40^Y_B6c*@H8$CIxs}(ioLP1dR+>yvr0o#wkBI#k z`;+(-(tXLCK;m35+Z0QdxZuGF6d-SW5D(-j7fi#e$6PQE59eI)03L3-U^Ty|?^=Aq z%yzpg($P}uXtSe`YInP0#=OxLzbMjl7>fCw?|vSiO>P~=;)x}R-LMf4&D~JX!E@K+ z(}=kTGVp444?M)L#(Wt+i~P0+mhyzArjTv(qEksiFU;XLWPcSun^g5e3O?4^3*~sY z+zVCw_M{XtxG6emZo6HXi`l_ivn>&Iw4Jp}EOqQ^DxNl+A+1kFr;fu;?zuiFVlQ{2 z$CQ|UNW^!ZIs!?gupg%2`Idf|gNKfOxEBx4^g{+u%Ivs=ByyAAV?F|wd5qiWls0>v z9o?GOA~R}2+7(*KIYDJdU@lK@LJB$6J9P%>KLRtk&|(hEPa!YfJ#7I=bwfM?tK6^@ z53O!U!^258%=Isewi#U3Kiw=l<^iyB0ZF8M0AldmKEQHykRMJDz+Aj|X#f`D!E_WB nCdhL`)*J<=IjdoH!^(yg4Xb|n$D=SQy6*)2M%jq#2iE@s{1BrD diff --git a/tamingllms/_build/.doctrees/notebooks/evals.doctree b/tamingllms/_build/.doctrees/notebooks/evals.doctree index b363fbe6f5da25edbb9f37822e59a35b321031a9..c99dfb6e63c3bab38967d490ec3c3c4f9ec8af55 100644 GIT binary patch delta 8534 zcmcgx33L?Ymd^j@LPAJ@kbR{R_Jlw}5`ri)2_cXbf(WQ6>2xZjJ4tuw?gSDa29#)k zfJlX0w!@-4aY02m(6T6tJW&Tm7{vvZ<>|P+_XOVPQ;+jt?q5~uRMXBm%sJz5IDb{w zckli0{qFt$|5oMwyv7CJctfIHFwNJcqBH&ZyyP$aGVBnX^mwglV9qKcN zDW3!QdAl zj8}Sxo>D#zi&wr4{Z1JZ5wBbw_>D3wGGX5jVH240Q)Gft9Pyd*WmLQ}IdZ>pHabCx zhbLb!4^ByitfqZ1h$r<{w9du&+j6T@Pedsr-Q|0`e2-Ftfbr`~KW z^Y`+KS+U}65Elsl2ESnro z$>fym)l$A}mg_5Jhcq^QY?d^_B-LANmioqe$!WGbWFav=d#Ge_lD&nE76;jL*`-Am z+eqnQv&GdaFR3s&%nc4pi|mk{jdd=kuv0Nwnp0z`lcg$IZjkC|I-A9|K&qiBot6bQ zOSQ#pvbpM-B~w+C$z~>_78^}qmTfN0xlvL>BhBuuZfa<7*iGgd*(o*14vMB;YHhIF z9c0IDlU%>W!f0t$4f#(0D`m5((J4zVhpEcqve<2=x?9XU-O_sk6A6g|4Z} z)ZmgR2^6WlzLJt7)!OWf>g1{gvYH$UpBSca#Jnz(v(`CUBvzUs=+zSBBwo}ckm)M9 zsa3ACH`EK-C{Z?hlUyf>Dsadqr(Jc%C7Ww(mW7S7bCk$KmFg+EV$1`PZx$M|$b#_M z>7sf`vsb6lk2I6R)m;nTBvaCxT1ilnn@n{YMUqo?N!1Q}y_A_rMPm~_XNu6Xvqy_6 zQXgbdP*SZ-l$dH|(;`!|NRO+=?T>nMy@+dX2WNnUK0Wi4=u&1G^dASboz7AryoOeA}+ zp3$A_Pk-#E{;LUiuyaIBWXGaezaeP;e^_VuZV<%br0xEZ{cl-ncx*EuALe1!!#G=BM z06$V6EhdVu@o%iGiJW_c`}zGJtn${|TIRpsDsR25W&ZoE^48m0=D*)6Z@sN${`;-+ z*4tX01u(82??6@a1`2tA&Rx(j#>!EO~LR5emD>Y;PlUV zKinJwL)msb9s+~d)7rA!sx3>Fsa&yE;KqKC#JccsKRBkTsXo5cANpz*!`!F)Ly$i^ zrMVRbVcpM%hrvG>Kc9;k+x-)9Vg#hIOIRC0Ze0-*4NxFiyu7m5o9@$`_C9kl4h`j4}=R@2!%SRM;0d`6za&L!ivSQyOfb;znhYz9hk zFq<#epDy!K_$dx1@y$B4!3%Q7!(_f!hhFhO*Z(+@m5N`3`^bT`y~SCh^W3i_z%Ir= z)rCLSgwJa}eKd$j_`42W?S^n!Dh$O#5=g8MCM81#?>$-B@#;vfYdw%eJWCpq;D!8@ z1S46FhGcmVa*O|a-ui27ocnSzEMRPoCNWEsD4(qI^Y#$>2 zE_jM(P4n0(m>3D!7?2N1Ja7u8AOZmgkLMngUZj2J#Cafz9$3=y;LG3xVSWki)9Iu{CE|w7|s15?pK8ShT%y=Tx}K zAJLsVK8^Gr*YyjjwjRMd3c=*T%WUNQ1}{9h2o?xjn8(9KJzM03B6yndJ-X#hC2$9O zO=wPHdoa8d9;OQgoX_6vei%^(Gd=QcWsu4~>xQnEQ89I!8ao{_+4tShv(sru>V{%x zkbi!7y@JxRb}Wy=P1A{CLRAvzs(Sg}46PQfKJ1@@KbJ!ePArFxfg=n2{G6iu`lXk) z85Q8C<*<&Ws`iwIT?3S3wLytRiv@?YSCM#QrCc*e^mj=Qc6NZJYHSw1ITOaSHurzc zgi1zvR9*R(b{39nXLWO<*5{{uMuAol&)-W4oojIV*Lz8F1_sQAi9$#{5tbMdr&saD z+3*=}DDYHqQg`)#I0vq71$(haJLK`6sh=xAQ#(t+i4Ty$ZiV^>U_U#dE6gKBuBooPIDx)9WC1&WaPOMhV>Z)#+!rd=bK`f*5Gs8xOCi2J2$xgg~W^hMOV1+oj z8itV66&qyW#%e7G>IQVA8m!EM%N9Z^K2l3>81)NikJr1E6${`!)~wk+*bMjbWt#0S z3pvn?!4@c|i)!>qdeptEx%XONJ7XI?R+BY@2l?|;wLP$-u16YQse_l9+anwoxJ7tQ z{f4$J9IJ;0>Tf8(0{)&hKkl}X1E=sc8;qa}IdE6E%VX`ZnthGO>~NRIo$v;zW0VLA zvbZ~ymls01Kb~0(p*Z7n9)hDAVHEE_4d*pdvO=e6*$kN`QsKf0O)!G@nx?K@+}upv zo7#rG+|-?iuw7@Gfb((Fv~~2C#W0fJO-3jW_-hMEjl+AJ$^UKvmCPC50@GU{(Ic>` z1@7dP-U0y+Q(Hq8)K$2iQkT-)I}HK?dFf9)MqI_@!3PYJyKgQ96Jtljv^u-t?&FU_AmgXi z858iE$6yHmKsWgLa+>ipmM(`_T_7IYmeZ#CmAAmR%ZU!nLOc#_BY~g21y-~X$t0kz z#j|aczTiT2~Bk?rmGs$5kpK^@dT{q zIfYt}8`n-fZdM`nxbygwLebu?`&01$aHlo1udblxp;=W=XHCL`6z(iko3;5XA(gMy zq19ec-&J%-@6w^2UeHUcDAqT0=&%P8ZEJ)0l0lKi!<@TI13Me|_w zYIu>~(4k+wpo43moyQcZb(?&mhmeROalp`*1tuB3BSawUqFGXl@Z6J76y+;KYdE3^ z$F7BPez#^c*k}}YY=UIG)|>U>1B=u)S@&?lQ*b{s;rgdwI5(3bZLDm5*E-rL4{oGA zv2h)d@jwxEIm7r|kD_Q1FBXbk8D}Yw&AW=Uu2az@))uMs2X(>{{*n&8-~|P)ho|}5 znxA-SJ@kz_`I}*%)WW_{gvW2N$+!(6AktToVn2zSx~Px%o53UqL1p0=Dg4*lP(OlD z#`_g(+lcUpoiuQE+{^pp8^FdgFnuF+fSIHs4x=%}+DVeLomh+AM7>HX9^X!9(g=KN z3#72M!dglNctx?w+QH56N4`XdTD+i3Tc~6<=ssTD3VowHi+`(F zI*XO0PsZ^VidEJUPKx4+v---4hd5^|@%FmGC@1yunB;XqKs|!+z1y>jZ7`Qz#)I2v zMgLh3`(iOBAD~@(!*)6yE@%;d65{IKLRWROKGn07XtNg0p8MiXdQ}fy zP~zR_&35x=A%k&CiL&s8QKHG~Q*>6NYBsudQGZJ>;Td&N?jLr+4&X~m)Z;YxeS>{* zXok7J+6||HuhH@IKDhs0u>12ZCED(}c0Y9)&*Gu|w0S`92k2&?L#%H+wH5kM%>gFzJ=6*T?mAI)N z#Ps<{3yXABs_3e2!|G%GdjnoS2*VQYrta2MGxEVsYKUus=sV<)o@iW{b%;)`kWx%~ znWy66LlDJ-O7Vk3FbzE4V)72tJ~C7n={igz$)!rt6JdDbFn#68EY&!;{|Lo65tkkz zrze%F3n<{uQc=n#bRDIQ>XCQ+OL6Y&^gSjMTV97bLQ~Cd{#DN>5;B~zI1C5B0olwY z+z-e4H)zLc)Mt-6N}qxpIOZtaLw`03$F8F=FmO4|LS^r8?>Pz=ID6B*>I8h@A3r`j zqf`H&3pwJ4_63(DS()gxBXRp%)Z*U6V{g$(=||7rqV$F8SllUc^d?R`1=(>oXo8N= zMRrG(`uB=7`rA&UOLlfz2bAIJQMb{cH?qbAXCtd4#de+gW z&a$A!CBE`^TJNNsSU<|A;R7)wQ5jnk#qb9)G>c)7P>iI%OId5g-FR`g9&f%4F*IGw zJMd(Vm~MqSRl7RP3TsN4pI^Ibw_UZ_uG(u?ZLLu4tgx;x0?vNn%KvMni}goU}(<=Zj%7ZDrOm zV)(llE{Z`7Gplee-KufU7QwzIhMi*gUJMt+pvE5Li9K1|O&5cxDr>di!y)cgiQ#b? z#A{n-Ho8vJwlx}$zX#C+#thY7!0|m`nGbkJ4>jhdcZm#aBL6Q%-;P>K&|U?|H6G%$3x4`^U$nh$7TXr>QnU}&BXXke(?2Q)C$ z-~$>MYViRL3@yJE%rP*u#s@Sow7~~7F!Zz!Xkci!4`^WMfDdS3=$H>^VCY>H($tTmS$7 delta 6613 zcmb7|dstOf7Qos2E)@_Iya)*Q@(_W*2cROB&kM-=YAEI-MGz6Zh(Jh~=Hu#^n3#`) zt}!*meB?75Kf6=oTg$$_Qf#s`HRbDJH6>b2la)AYpL5`z>*0bPT?!1A-g5s^+SSQ?LNzCi2^qr?k`o-&A>8s8r>5BK)(q~?#BcFANX3{ZllVtY!P&(1YBz5#XFZcPa z?|0I7)OSm_Z=?@=%}0JVKEb4qx|yZy?$@O&Mw67{e^iR=Zj$^0nxtrd)B3?|B1r9PVi7{m_x#1uD452DeLY@}J{PA6Uxq-`?PZ zacfvljP3&6aj_8^c$x*n5AohO(hmYy3NASdQMkwtqWAC5B%T@R*cQv zA(>ZLa8)*h@G1*7ALSNaZb9ENFycB-D8xVfA)Ew50^k%cwqR8#k4K+C=&W>Pbi@UM zr-5%I6KhyJmIXV>i^1>{7qh8%VXFPI^kAC^r-NG%ny{D4LBHHP4rMG(y1#w8&z zjf8K7z+S#nweycq=*;shQuEnpYzc*F{3?0ISsz@q7Ti7hG{R@E;xUTJX%(GtK|wL(@)Z?bZbKLz2fgrA4D{eFD*L{T$s?T{3o*Q7iW~%Hz}$Wi zOOqly)|ia;%`nK3(i=(`k5XkKWEmcoBCGfA3qSE;Dtf{NZRnBSjPB7aL|Z)zFJzBH-m zL#692iZ8cQuxbM&xQM4Ef|a+b;=i|vyLd8g5EL;UlBxtQ810Xfn}MlvP?82iHBVGg ze-|V>lB1%`HY8?g7%x&k7qoqrJ1ctv`ttcIs%=O2cEzwp3?D*PZ(F3?9wE}*Bko-~ z)RgwMjdXWYwh7vYzzlW&JANAi9T?lYaTpAxPX+TJ0P}{yMD9q%TNWOJ7Y6Vk_rb~z zlexd6>u|W=nV(k$ZdHQ`$7In2C>h03mIWb<-$+$*$<0)eOXQ8tt_e3o<&Dn2O+|YM zB%&<`!uhAExH<=_#ItZBn9*#7$M{3iG#Liq+A*~GiL9`~K=DlI&rNvM3en6L*=X8Y z^euTbq_?v@qruD~+FAX6Zfa*E$54_hvVT^R%DZC9fgc@Cd6UmU*QQx(qnvUt4R_lhiv6TfZBbNl|12s5mX|>cJLh=447$oje-`jz6VHcm5u9Pb+ zbDd6ftARu81)*Y9QyNA*3zd9hx{{p4M(3}n1L|5!iW@PwmNx%Jc|cw}oIc6()5zac zta$Cz5C^5&WsGKk?DWR2^C3(zW^|;?hj;`3I9-{|mObE(n-)SZ{&_l{Tu8HZi%e5> z#;&7J8Pwp`MdS`i%iH&3#f5zuL!!(x?ulOv9IF?CFIG;3E;zodd4#|l#P3IIV}hE^tiK}@(Y`U_7#xF3)Svg>qvDH`q#l^ zdQ{cX`Bbezd{PD;e24cCPdSsnsU5#K(;UyOge=BasRr-7NVTkLu-7VZXKXVj*l8qL z^|WbT#magZL7xuS(?V9sH}w#~j<%7U)pW+Gq;55l^KIn&)l@ib&QK}OPVsqlRl zXhNuC-w4L5CnNIFKkdm$(^ z>x>ATAdkw+A92H9A(wk+DH|Yo6J=cLBezFS;bgOmA6qxkqKn60H_@%7KPiZdQNJwZ zVl;FS<-;p4(~QSv(S%0v=qx$9G(J&?9mFM@VKy%&X>pgBE<*}8YlH|a-vSHy^D27I z1(j{3`T+6CZ7?EG%6fDFV!%?C7%<>as;#>zRC-AYRS~|un~JM?O_#hVTOBzoaC*SvaQhxuz+OkwYqVwG zP=|kzd=gwNx0i~{H9$0Vy@`a9^N9ZY^&z=Wq ziRW3cWq~0UmmPu^@ccno?K>#PwZMyX6d!^FMrW|)1em4DJwBqy!waudj*P@lUxyNL z4NsUI;Fxe28i9%{A%`RDXsgE_mGiqeZXB6M;R9e(oz&~HCXvZUl4IuEP+{O@Im+OE zI!^hd2F)iZnbr!uWL}+vD_(=Sc>e??uj(qu`4JbOIdHBl-Iw|QFj3D<e;VIBBbJ_l zZp<4OoT2h81@R0ef$L7A!Dk_eU(La!vyh?u#FE2)cH&5p~@|lYA%DFQuDl~bli&fc4o>nY5O&%Y6hspwuH{O8>oLd!FQ(M|^ zDP%#o`og=A$Ws)zo8G0Yl4@0c0GfZEe&?~^`tvXjC%;4G=l%20&3&A3BG_he7~g}> zI9u^;RUFg)Xt)TW;TME9MXF|cy>D&BGc$@RYUa%C zTRv+}&D`SZ2HQm|rd*`M?v|>v=pvmV7jXMUNDTgvRR2`zrWNmAr0FlmpeFd5jYM_{ zjHSa#+;*9hj>qg-C8cwVpDmo-VDn6)ue(#Jc~vxbY0uk!5KT5MWm|@5?uf?UKuwb9 zVz79dA>L-vTTQ*koWk|S>M)AG~LA5TE*LG@piC%yaqbil<`L9jHcHWV${{5IWC&L?LoLI z-sB+M6K^3R5}s}0uvx_0WYLVLrXFU^7%&ij?Pu4{&8{tNf&fouQ( diff --git a/tamingllms/_build/.doctrees/notebooks/output_size_limit.doctree b/tamingllms/_build/.doctrees/notebooks/output_size_limit.doctree index c77c17e702e7870f3b4df88f0df03d354ab02767..8fce36f825397723eb3535856da74e669481654a 100644 GIT binary patch delta 176 zcmV;h08jtBpauS*1+ZBG0XDN*0e%FN%my5j^aa?H)CU}s=mzYQ(+C`s@CV|P#|a&i z?+DqGfu zk`EgJlcf#nN9(6 L0RcOg3{U}bcw0FO diff --git a/tamingllms/_build/.doctrees/notebooks/structured_output.doctree b/tamingllms/_build/.doctrees/notebooks/structured_output.doctree index 43a99ab948cdd8faffd2514db8f4a8472d1df9e9..74d7f5e03e279733a47b5c7dfea91e067cf96f97 100644 GIT binary patch delta 230 zcmV(2(W7b0W`C00dxeD%LW{i`~}aGv4>XYCF9Fyk;&6CRp9Fx!o$&<7P9Fw&O zw3C?#9Fv6!+>>Yt9g}_v%adXX9Ft`W`;$ov9Fu$u^pju=9J7oKAOVwb4IPuo4$hOT z4jhxS56qL74;-_b5E%iJe-Ip#&=JIwv=JPWuoB;sloA}1s}tsvjuRcTt`sl

-

4. The Evals Gap

+

4. The Evals Gap

It doesn’t matter how beautiful your theory is,
it doesn’t matter how smart you are.
@@ -203,45 +203,45 @@

Contents

-

4.1. Non-Deterministic Generative Machines

+

4.1. Non-Deterministic Generative Machines

One of the most fundamental challenges when building products with Large Language Models (LLMs) is their generative and non-deterministic nature. Unlike traditional software systems where the same input reliably produces the same output, LLMs can generate novel text that may not exist in their training data, and produce different responses each time they’re queried - even with identical prompts and input data. This behavior is both a strength and a significant engineering challenge and product challenge.

When you ask an LLM the same question multiple times, you’ll likely get different responses. This isn’t a bug - it’s a fundamental feature of how these models work. The “temperature” parameter, which controls the randomness of outputs, allows models to be creative and generate diverse responses. However, this same feature makes it difficult to build reliable, testable systems.

Consider a financial services company using LLMs to generate investment advice. The non-deterministic nature of these models means that:

@@ -252,16 +252,16 @@

-

4.1.1. Temperature and Sampling

+

4.1.1. Temperature and Sampling

The primary source of non-determinism in LLMs comes from their sampling strategies. During text generation, the model:

  1. Calculates probability distributions for each next token

  2. Samples from these distributions based on temperature settings

  3. -
  4. Uses techniques like nucleus sampling [Holtzman et al., 2020] or top-k sampling to balance creativity and coherence

  5. +
  6. Uses techniques like nucleus sampling [Holtzman et al., 2020] or top-k sampling to balance creativity and coherence

-

4.1.2. The Temperature Spectrum

+

4.1.2. The Temperature Spectrum

  • Temperature = 0: Most deterministic, but potentially repetitive

  • Temperature = 1: Balanced creativity and coherence

  • @@ -360,19 +360,19 @@

    -

    4.2. Emerging Properties

    +

    4.2. Emerging Properties

    Beyond their non-deterministic nature, LLMs present another fascinating challenge: emergent abilities that spontaneously arise as models scale up in size. These abilities - from basic question answering to complex reasoning - aren’t explicitly programmed but rather emerge “naturally” as the models grow larger and are trained on more data. This makes evaluation fundamentally different from traditional software testing, where capabilities are explicitly coded and can be tested against clear specifications.

    Emerging Properties
    -

    Fig. 4.1 Emergent abilities of large language models and the scale [Wei et al., 2022].

    +

    Fig. 4.1 Emergent abilities of large language models and the scale [Wei et al., 2022].

    Fig. 4.1 provides a list of emergent abilities of large language models and the scale. The relationship between model scale and emergent abilities follows a fascinating non-linear pattern. Below certain size thresholds, specific abilities may be completely absent from the model - it simply cannot perform certain tasks, no matter how much you try to coax them out. However, once the model reaches critical points in its scaling journey, these abilities can suddenly manifest in what researchers call a phase transition - a dramatic shift from inability to capability. This unpredictable emergence of capabilities stands in stark contrast to traditional software development, where features are deliberately implemented and can be systematically tested.

    The implications for evaluation are pressing. While conventional software testing relies on stable test suites and well-defined acceptance criteria, LLM evaluation must contend with a constantly shifting landscape of capabilities. What worked to evaluate a 7B parameter model may be completely inadequate for a 70B parameter model that has developed new emergent abilities. This dynamic nature of LLM capabilities forces us to fundamentally rethink our approach to testing and evaluation.

-

4.3. Problem Statement

+

4.3. Problem Statement

Consider a practical example that illustrates these challenges: building a Math AI tutoring system for children powered by an LLM. In traditional software development, you would define specific features (like presenting math problems or checking answers) and write tests to verify each function. But with LLMs, you’re not just testing predefined features - you’re trying to evaluate emergent capabilities like adapting explanations to a child’s level, maintaining engagement through conversational learning, and providing age-appropriate safety-bound content.

This fundamental difference raises critical questions about evaluation:

    @@ -422,7 +422,7 @@

    -

    4.4. Evals Design

    +

    4.4. Evals Design

    First, it’s important to make a distinction between evaluating an LLM versus evaluating an LLM-based application. While the latter offers foundation capabilities and are typically general-purpose, the former is more specific and tailored to a particular use case. Here, we define an LLM-based application as a system that uses one or more LLMs to perform a specific task. More specifically, an LLM-based application is the combination of one or more LLM models, their associated prompts and parameters to solve a particular business problem.

    That differentiation is important because it changes the scope of evaluation. LLMs are usually evaluated based on their capabilities, which include things like language understanding, reasoning and knowledge. LLM-based applications, instead, should be evaluated based on their end-to-end functionality, performance, and how well they meet business requirements. That distinction has key implications for the design of evaluation systems:

      @@ -509,7 +509,7 @@

      -

      4.4.1. Conceptual Overview

      +

      4.4.1. Conceptual Overview

      Fig. 4.2 demonstrates a conceptual design of key components of LLM Application evaluation.

      Conceptual Overview @@ -590,7 +590,7 @@

      -

      4.4.2. Design Considerations

      +

      4.4.2. Design Considerations

      The design of an LLM application evaluation system depends heavily on the specific use case and business requirements. Here we list important questions for planning an LLM application evaluation system pertaining to each of the key components previously introduced:

      1. Examples (Input Dataset):

        @@ -675,7 +675,7 @@

        -

        4.5. Metrics

        +

        4.5. Metrics

        The choice of metric depends on the specific task and desired evaluation criteria. However, one can categorize metrics into two broad categories: intrinsic and extrinsic.

        • Intrinsic metrics focus on the model’s performance on its primary training objective, which is typically to predict the next token in a sequence. Perplexity is a common intrinsic metric that measures how well the model predicts a given sample of text.

        • @@ -985,11 +985,11 @@

          4.6. Evaluators

          +

          4.6. Evaluators

          -

          4.6.1. Model-Based Evaluation

          +

          4.6.1. Model-Based Evaluation

          Traditional metrics like BLEU or ROUGE often fall short in capturing the nuanced, contextual, and creative outputs of LLMs. As an alternative we can consider a “Model-based evaluation” approach. A common approach is to use an LLM as a judge. This is an approach that leverages language models themselves to assess the quality of outputs from other language models. This method involves using a model (often a more capable one) to act as an automated judge, evaluating aspects like accuracy, coherence, and relevance of generated content. Unlike traditional metrics that rely on exact matching or statistical measures, model-based evaluation can capture nuanced aspects of language and provide more contextual assessment.

          -

          As discussed in the paper [Li et al., 2024], LLM-based evaluation approaches generally fall into two main categories:

          +

          As discussed in the paper [Li et al., 2024], LLM-based evaluation approaches generally fall into two main categories:

          1. Prompt-based evaluation: This involves using prompts to instruct existing LLMs to evaluate text quality without any fine-tuning. The evaluation can take several forms:

            The visualization helps highlight these differences across models and evaluation dimensions. A clear performance gradient is visible from gpt-4o-mini to gpt-3.5-turbo, with the latter showing marked degradation in most metrics.

            -

            Leveraging LLMs for evaluation has several limitations [Li et al., 2024]. Firstly, computational overhead should not be neglected given the inherent cost of running additional model inferences iterations. LLM evaluators can also exhibit various biases, including order bias (preferring certain sequence positions), egocentric bias (favoring outputs from similar models), and length bias. Further, there may be a tight dependency on prompt quality - small prompt variations may lead to substantially different outcomes. It is important to also note challenges around domain-specific evaluation in fields such as medice, finance, law etc, where a general llm-as-a-judge approach may not be suitable.

            +

            Leveraging LLMs for evaluation has several limitations [Li et al., 2024]. Firstly, computational overhead should not be neglected given the inherent cost of running additional model inferences iterations. LLM evaluators can also exhibit various biases, including order bias (preferring certain sequence positions), egocentric bias (favoring outputs from similar models), and length bias. Further, there may be a tight dependency on prompt quality - small prompt variations may lead to substantially different outcomes. It is important to also note challenges around domain-specific evaluation in fields such as medice, finance, law etc, where a general llm-as-a-judge approach may not be suitable.

            The LLM-as-a-Judge strategy can serve as a scalable and nuanced solution to evaluate LLM-based applications. While it does not entirely a metrics-based or human-based aproach, it significantly augments evaluation workflows, especially in scenarios requiring evaluation of generative outputs. Future improvements could include integrating human oversight and refining LLMs for domain-specific evaluation tasks.

          -

          4.6.2. Human-Based Evaluation

          +

          4.6.2. Human-Based Evaluation

          Human assessors can judge aspects like fluency, coherence, and factual accuracy, providing a more comprehensive evaluation. However, human evaluation can be subjective and resource-intensive.

          -

          4.6.3. Evaluating Evaluators

          +

          4.6.3. Evaluating Evaluators

          We have discussed how LLMs can be used to evaluate LLM-based aplications. However, how can we evaluate the performance of LLMs that evaluate other LLMs? This is the question that meta evaluation aims to answer. Clearly, the discussion can become quite meta as we need to evaluate the performance of the evaluator to evaluate the performance of the evaluated model. However, one can make a case for two general options:

          1. Use a gold-standard dataset that is used to evaluate the performance of LLM evaluators using a “metrics-based” approach.

          2. @@ -1248,7 +1248,7 @@

            Fig. 4.5 Conceptual overview of LLMs Meta Evaluation.

      -

      An alternative to the above approaches is to use humans to directly evaluate the LLM-judges themselves. A notable example of this is Judge Arena [Arena, 2024], which is a platform that allows users to vote on which AI model made the better evaluation. Under this approach, the performance of the LLM evaluator is given by the (blind) evaluation of humans who perform the voting on randomly generated pairs of LLM judges as depicted in Fig. 4.6. Only after submitting a vote, users can see which models were actually doing the judging.

      +

      An alternative to the above approaches is to use humans to directly evaluate the LLM-judges themselves. A notable example of this is Judge Arena [Arena, 2024], which is a platform that allows users to vote on which AI model made the better evaluation. Under this approach, the performance of the LLM evaluator is given by the (blind) evaluation of humans who perform the voting on randomly generated pairs of LLM judges as depicted in Fig. 4.6. Only after submitting a vote, users can see which models were actually doing the judging.

      Human-in-the-loop meta evaluation Conceptual Overview
      @@ -1275,18 +1275,18 @@

      -

      4.7. Benchmarks and Leaderboards

      +

      4.7. Benchmarks and Leaderboards

      Benchmarks act as standardized tests for LLMs, evaluating their performance across a spectrum of tasks. These tasks simulate real-world applications such as answering questions, generating coherent text, solving mathematical problems, or even writing computer code. They also assess more abstract qualities like fairness, robustness, and cultural understanding.

      Benchmarks can be thought as comprehensive “exams” that probe different “subjects” in order to certify an LLM. They help researchers and developers compare models systematically, in a way LLM performance is comparable while enabling the identification of emergent behaviors or capabilities as models evolve in scale and sophistication.

      -

      The history of LLM benchmarks reflects the evolving priorities of artificial intelligence research, starting with foundational tasks and moving toward complex, real-world challenges. It began in 2018 with the introduction of GLUE(General Language Understanding Evaluation) [Wang et al., 2019], which set a new standard for evaluating natural language understanding. GLUE measured performance on tasks like sentiment analysis and textual entailment, providing a baseline for assessing the fundamental capabilities of language models. A year later, SuperGLUE [Wang et al., 2019] expanded on this foundation by introducing more nuanced tasks that tested reasoning and language comprehension at a deeper level, challenging the limits of models like BERT and its successors.

      -

      As AI capabilities grew, benchmarks evolved to capture broader and more diverse aspects of intelligence. BIG-Bench [Srivastava et al., 2023] marked a turning point by incorporating over 200 tasks, spanning arithmetic, logic, and creative problem-solving. This collaborative effort aimed to probe emergent abilities in large models, offering insights into how scale and complexity influence performance. Around the same time, specialized benchmarks like TruthfulQA [Lin et al., 2022] emerged, addressing the critical need for models to provide accurate and non-deceptive information in a world increasingly dependent on AI for factual content.

      -

      MMLU (Massive Multitask Language Understanding) [Hendrycks et al., 2021] launched in 2021, provided a rigorous test of a model’s multidisciplinary knowledge, covering 57 subjects from STEM fields to humanities and social sciences. Similarly, in 2022, Stanford’s HELM (Holistic Evaluation of Language Models) [Liang et al., 2023] set a new standard for multidimensional assessment. HELM expanded the scope of evaluation beyond accuracy, incorporating factors like fairness, robustness, and computational efficiency. This benchmark was designed to address societal concerns surrounding AI, emphasizing safety and inclusion alongside technical performance.

      -

      Specialized benchmarks like HumanEval (2021) [Chen et al., 2021] focused on domain-specific tasks, such as code generation, testing models’ ability to translate natural language descriptions into functional programming code. In contrast, LMSYS (2023) brought real-world applicability into focus by evaluating conversational AI through multi-turn dialogues. LMSYS prioritized coherence, contextual understanding, and user satisfaction, providing a practical lens for assessing models like GPT and Claude in dynamic settings.

      -

      The HuggingFace Open LLM [Face, 2024] Leaderboard stands out for its transparency and accessibility in the open-source community. This leaderboard evaluates a wide range of LLMs across diverse tasks, including general knowledge, reasoning, and code-writing. Its commitment to reproducibility ensures that results are verifiable, enabling researchers and practitioners to replicate findings. By focusing on open-source models, it democratizes AI research and fosters innovation across communities, making it a valuable resource for both academics and industry professionals.

      -

      The Chatbot Arena (2024) Leaderboard (an evolution of LMSYS)[Chiang et al., 2024] takes an alternative approach by measuring real-world performance through direct model comparisons. Its evaluation format compares models in live conversations, with human judges providing qualitative assessments. This methodology has gathered over 200,000 human evaluations, offering specific insights into practical model performance. The emphasis on interactive capabilities makes it relevant for developing user-facing applications like virtual assistants and chatbots.

      -

      The AlpacaEval [Dubois et al., 2024] and MT-Bench [Zheng et al., 2023] Leaderboards implement automated evaluation using GPT-4 to assess model performance in multi-turn conversations. This approach enables consistent assessment of dialogue capabilities while reducing human bias. Their methodology measures key aspects of conversational AI, including contextual understanding and response consistency across multiple exchanges.

      -

      A major challenge with these leaderboards and benchmarks is test set contamination - when test data ends up in newer models’ training sets, rendering the benchmarks ineffective. While some benchmarks try to address this through crowdsourced prompts and evaluations from humans or LLMs, these approaches introduce their own biases and struggle with difficult questions. LiveBench [White et al., 2024] represents a novel solution, designed specifically to be resilient to both contamination and evaluation biases. As the first benchmark with continuously updated questions from recent sources, automated objective scoring, and diverse challenging tasks across multiple domains, LiveBench maintains its effectiveness even as models improve. Drawing from recent math competitions, research papers, news, and datasets, it creates contamination-free versions of established benchmark tasks. Current results show even top models achieving below 70% accuracy, demonstrating LiveBench’s ability to meaningfully differentiate model capabilities. With monthly updates and an open collaborative approach, LiveBench aims to provide sustained value for model evaluation as the field advances.

      -

      A significant shift in AI evaluation came with the launch of the The Alignment Research Center (ARC) Prize [Chollet, 2024] by ARC Prize Inc., a non-profit for the public advancement of open artificial general intelligence. Hosted by Mike Knoop (Co-founder, Zapier) and François Chollet (Creator of ARC-AGI, Keras), this prize represents a paradigm shift in how we evaluate language models. Rather than focusing on narrow performance metrics, the ARC Prize assesses what it calls “cognitive sufficiency” - a model’s ability to generate meaningful insights and tackle open-ended challenges. This new way to think about LLM evaluation emphasizes creative thinking, sophisticated reasoning, and the capacity to make genuinely useful contributions to human knowledge as we seek to define and measure what it means to achieve AGI (Artificial General Intelligence).

      +

      The history of LLM benchmarks reflects the evolving priorities of artificial intelligence research, starting with foundational tasks and moving toward complex, real-world challenges. It began in 2018 with the introduction of GLUE(General Language Understanding Evaluation) [Wang et al., 2019], which set a new standard for evaluating natural language understanding. GLUE measured performance on tasks like sentiment analysis and textual entailment, providing a baseline for assessing the fundamental capabilities of language models. A year later, SuperGLUE [Wang et al., 2019] expanded on this foundation by introducing more nuanced tasks that tested reasoning and language comprehension at a deeper level, challenging the limits of models like BERT and its successors.

      +

      As AI capabilities grew, benchmarks evolved to capture broader and more diverse aspects of intelligence. BIG-Bench [Srivastava et al., 2023] marked a turning point by incorporating over 200 tasks, spanning arithmetic, logic, and creative problem-solving. This collaborative effort aimed to probe emergent abilities in large models, offering insights into how scale and complexity influence performance. Around the same time, specialized benchmarks like TruthfulQA [Lin et al., 2022] emerged, addressing the critical need for models to provide accurate and non-deceptive information in a world increasingly dependent on AI for factual content.

      +

      MMLU (Massive Multitask Language Understanding) [Hendrycks et al., 2021] launched in 2021, provided a rigorous test of a model’s multidisciplinary knowledge, covering 57 subjects from STEM fields to humanities and social sciences. Similarly, in 2022, Stanford’s HELM (Holistic Evaluation of Language Models) [Liang et al., 2023] set a new standard for multidimensional assessment. HELM expanded the scope of evaluation beyond accuracy, incorporating factors like fairness, robustness, and computational efficiency. This benchmark was designed to address societal concerns surrounding AI, emphasizing safety and inclusion alongside technical performance.

      +

      Specialized benchmarks like HumanEval (2021) [Chen et al., 2021] focused on domain-specific tasks, such as code generation, testing models’ ability to translate natural language descriptions into functional programming code. In contrast, LMSYS (2023) brought real-world applicability into focus by evaluating conversational AI through multi-turn dialogues. LMSYS prioritized coherence, contextual understanding, and user satisfaction, providing a practical lens for assessing models like GPT and Claude in dynamic settings.

      +

      The HuggingFace Open LLM [Face, 2024] Leaderboard stands out for its transparency and accessibility in the open-source community. This leaderboard evaluates a wide range of LLMs across diverse tasks, including general knowledge, reasoning, and code-writing. Its commitment to reproducibility ensures that results are verifiable, enabling researchers and practitioners to replicate findings. By focusing on open-source models, it democratizes AI research and fosters innovation across communities, making it a valuable resource for both academics and industry professionals.

      +

      The Chatbot Arena (2024) Leaderboard (an evolution of LMSYS)[Chiang et al., 2024] takes an alternative approach by measuring real-world performance through direct model comparisons. Its evaluation format compares models in live conversations, with human judges providing qualitative assessments. This methodology has gathered over 200,000 human evaluations, offering specific insights into practical model performance. The emphasis on interactive capabilities makes it relevant for developing user-facing applications like virtual assistants and chatbots.

      +

      The AlpacaEval [Dubois et al., 2024] and MT-Bench [Zheng et al., 2023] Leaderboards implement automated evaluation using GPT-4 to assess model performance in multi-turn conversations. This approach enables consistent assessment of dialogue capabilities while reducing human bias. Their methodology measures key aspects of conversational AI, including contextual understanding and response consistency across multiple exchanges.

      +

      A major challenge with these leaderboards and benchmarks is test set contamination - when test data ends up in newer models’ training sets, rendering the benchmarks ineffective. While some benchmarks try to address this through crowdsourced prompts and evaluations from humans or LLMs, these approaches introduce their own biases and struggle with difficult questions. LiveBench [White et al., 2024] represents a novel solution, designed specifically to be resilient to both contamination and evaluation biases. As the first benchmark with continuously updated questions from recent sources, automated objective scoring, and diverse challenging tasks across multiple domains, LiveBench maintains its effectiveness even as models improve. Drawing from recent math competitions, research papers, news, and datasets, it creates contamination-free versions of established benchmark tasks. Current results show even top models achieving below 70% accuracy, demonstrating LiveBench’s ability to meaningfully differentiate model capabilities. With monthly updates and an open collaborative approach, LiveBench aims to provide sustained value for model evaluation as the field advances.

      +

      A significant shift in AI evaluation came with the launch of the The Alignment Research Center (ARC) Prize [Chollet, 2024] by ARC Prize Inc., a non-profit for the public advancement of open artificial general intelligence. Hosted by Mike Knoop (Co-founder, Zapier) and François Chollet (Creator of ARC-AGI, Keras), this prize represents a paradigm shift in how we evaluate language models. Rather than focusing on narrow performance metrics, the ARC Prize assesses what it calls “cognitive sufficiency” - a model’s ability to generate meaningful insights and tackle open-ended challenges. This new way to think about LLM evaluation emphasizes creative thinking, sophisticated reasoning, and the capacity to make genuinely useful contributions to human knowledge as we seek to define and measure what it means to achieve AGI (Artificial General Intelligence).

      Defining AGI according to ARC Prize:

      Consensus but wrong:

      @@ -1315,13 +1315,14 @@

      [Chollet, 2024]. While deep learning has significantly advanced in recent years, pure deep learning approaches perform poorly on the ARC-AGI benchmark. This is because traditional deep learning relies on relating new situations to those encountered during training and lacks the ability to adapt or recombine knowledge for entirely new tasks. ARC Prize 2024 spurred the development of novel AGI reasoning techniques, leading to a significant increase in the state-of-the-art score on the ARC-AGI private evaluation set from 33% in 2023 to 55.5% in 2024. A key takeaway is that algorithmic improvements, rather than massive computational resources, may be key to exceeding the target score for the ARC-AGI benchmark.

      As language models continue to advance in capability and complexity, evaluation frameworks must evolve. Modern benchmarks increasingly incorporate tests for nuanced reasoning, ethical decision-making, and emergent capabilities that weren’t previously measurable. This ongoing evolution reflects a deeper understanding that the true value of language models lies not in achieving high scores on standardized tests with narrow task-specific metrics, but in their ability to meaningfully contribute to human understanding and help solve real-world problems while demonstrating the ability to learn and adapt to new tasks.

-

4.8. Tools

+

4.8. Tools

-

4.8.1. LightEval

-

LightEval [Fourrier et al., 2023] is a lightweight framework for evaluation of LLMs across a variety of standard and bespoke metrics and tasks across multiple inference backends via Python SDK and CLI.

+

4.8.1. LightEval

+

LightEval [Fourrier et al., 2023] is a lightweight framework for evaluation of LLMs across a variety of standard and bespoke metrics and tasks across multiple inference backends via Python SDK and CLI.

As a motivating example, consider a scenario where financial data has been extracted from SEC financial filings and require econometric analysis. Tasks like estimating autoregressive models for time series forecasting or conducting hypothesis tests on market efficiency are common in financial analysis. Let’s evaluate how well different models perform on this type of task.

First, we need to select a benchmark to assess LLMs capabilities in this domain. MMLU has a sub-benchmark called Econometrics we can use for this task. Table 4.4 shows a sample of the benchmark dataset from MMLU Econometrics. It consists of multiple-choice questions from econometrics and expected answers.

@@ -1412,13 +1413,13 @@

return pipeline -

Fig. 4.8 shows a schematic representation of its key components. As inference engine, we leverage accelerate for distributed evaluation. lighteval also supports other inference backends such as vllm and tgi.

+

Fig. 4.8 shows a schematic representation of its key components. As inference engine, we leverage accelerate for distributed evaluation. lighteval also supports other inference backends such as vllm and tgi.

First, we instantiate an EvaluationTracker which manages result storage, in this example kept in a local directory output_dir, and tracks detailed evaluation metrics, optionally pushed to HuggingFace Hub.

Next, we instantiate an object of the class PipelineParameters which, in this example, configures the pipeline for parallel processing with a temporary cache in cache_dir also setting the maximum number of samples to process to max_samples. Then, in BaseModelConfig we set up the LLM model we would like to evaluate defined in pretrained.

-
+
LightEval Python SDK Sample Conceptual Overview.
-

Fig. 4.8 LightEval Python SDK Sample Conceptual Overview.

+

Fig. 4.8 LightEval Python SDK Sample Conceptual Overview.

This setup allows for systematic evaluation of language model performance on specific tasks while handling distributed computation and result tracking.

@@ -1433,7 +1434,7 @@

[Face, 2024] and metrics [Face, 2024]. The available tasks span multiple categories and benchmarks including BigBench, MMLU, TruthfulQA, WinoGrande, and HellaSwag. The framework also supports standard NLP evaluation metrics including BLEU, ROUGE, Exact Match, F1 Score, and Accuracy.

+

LightEval provides a comprehensive set of evaluation tasks [Face, 2024] and metrics [Face, 2024]. The available tasks span multiple categories and benchmarks including BigBench, MMLU, TruthfulQA, WinoGrande, and HellaSwag. The framework also supports standard NLP evaluation metrics including BLEU, ROUGE, Exact Match, F1 Score, and Accuracy.

In our case, we choose to evaluate our LLMs on the MMLU econometrics task using zero-shot learning. Hence, we define the task as follows:

-

We would like to compare the performance of multiple open source models on the MMLU econometrics task. While we could download and evaluate each model locally, we prefer instead to evaluate them on a remote server to save time and resources. LightEval enables serving the model on a TGI-compatible server/container and then running the evaluation by sending requests to the server [Face, 2024].

+

We would like to compare the performance of multiple open source models on the MMLU econometrics task. While we could download and evaluate each model locally, we prefer instead to evaluate them on a remote server to save time and resources. LightEval enables serving the model on a TGI-compatible server/container and then running the evaluation by sending requests to the server [Face, 2024].

For that purpose, we can leverage HuggingFace Serverless Inference API (or dedicated inference API) and set a configuration file for LightEval as shown below, where <MODEL-ID> is the model identifier on HuggingFace (e.g. meta-llama/Llama-3.2-1B-Instruct) and <HUGGINGFACE-TOKEN> is the user’s HuggingFace API token.

model:
   type: "tgi"
@@ -1483,17 +1484,17 @@ 

- + - + - +

Llama3.2 Instruct

LLaMA architecture-based pretrained and instruction-tuned generative models

Llama-3.2-1B-Instruct
Llama-3.2-3B-Instruct

[Meta AI, 2024]

[Meta AI, 2024]

Qwen2.5 Instruct

Instruction-tuned LLMs family built by Alibaba Cloud

Qwen2.5-0.5B-Instruct
Qwen2.5-1.5B-Instruct
Qwen2.5-3B-Instruct

[Face, 2024, Hui et al., 2024, Yang et al., 2024]

[Face, 2024, Hui et al., 2024, Yang et al., 2024]

SmolLM2 Instruct

Instruction-tuned family of compact language models built by HuggingFace

SmolLM2-360M-Instruct
SmolLM2-1.7B-Instruct

[Allal et al., 2024]

[Allal et al., 2024]

@@ -1506,135 +1507,139 @@

[Hugging Face, 2024]. Its integration with the Hugging Face ecosystem and modular architecture make it particularly powerful for evaluating open source models. For further details, visit the official repository [Fourrier et al., 2023].

+

In summary, LightEval is a simple yet flexible and comprehensive framework for evaluating LLMs across a wide variety of tasks and metrics. It can serve as a first step in selecting your next LLM for a specific task given the exponential growth in number of (open source) models available [Hugging Face, 2024]. Its integration with the Hugging Face ecosystem and modular architecture make it particularly powerful for evaluating open source models. For further details, visit the official repository [Fourrier et al., 2023].

-

4.8.2. LangChain

+

4.8.2. LangChain

-

4.8.3. PromptFoo

-

PromptFoo [PromptFoo, 2024] is a framework for evaluating the quality of prompts for LLMs.

+

4.8.3. PromptFoo

+

PromptFoo [PromptFoo, 2024] is a framework for evaluating the quality of prompts for LLMs.

-

4.9. References

-
-
-[ALB+24] +

4.9. References

+
+
+[ALB+24]

Loubna Ben Allal, Anton Lozhkov, Elie Bakouch, Gabriel Martín Blázquez, Lewis Tunstall, Agustín Piqueres, Andres Marafioti, Cyril Zakka, Leandro von Werra, and Thomas Wolf. Smollm2 - with great data, comes great performance. 2024.

-
+
[Are24]

Judge Arena. Judge arena: evaluating llm outputs with llms. https://judgearena.com/, 2024. Accessed: 2024.

-
+
[CTJ+21]

Mark Chen, Jerry Tworek, Heewoo Jun, Qiming Yuan, Henrique Ponde de Oliveira Pinto, Jared Kaplan, Harri Edwards, Yuri Burda, Nicholas Joseph, Greg Brockman, Alex Ray, Raul Puri, Gretchen Krueger, Michael Petrov, Heidy Khlaaf, Girish Sastry, Pamela Mishkin, Brooke Chan, Scott Gray, Nick Ryder, Mikhail Pavlov, Alethea Power, Lukasz Kaiser, Mohammad Bavarian, Clemens Winter, Philippe Tillet, Felipe Petroski Such, Dave Cummings, Matthias Plappert, Fotios Chantzis, Elizabeth Barnes, Ariel Herbert-Voss, William Hebgen Guss, Alex Nichol, Alex Paino, Nikolas Tezak, Jie Tang, Igor Babuschkin, Suchir Balaji, Shantanu Jain, William Saunders, Christopher Hesse, Andrew N. Carr, Jan Leike, Josh Achiam, Vedant Misra, Evan Morikawa, Alec Radford, Matthew Knight, Miles Brundage, Mira Murati, Katie Mayer, Peter Welinder, Bob McGrew, Dario Amodei, Sam McCandlish, Ilya Sutskever, and Wojciech Zaremba. Evaluating large language models trained on code. 2021. URL: https://arxiv.org/abs/2107.03374, arXiv:2107.03374.

-
+
[CZS+24]

Wei-Lin Chiang, Lianmin Zheng, Ying Sheng, Anastasios Nikolas Angelopoulos, Tianle Li, Dacheng Li, Hao Zhang, Banghua Zhu, Michael Jordan, Joseph E. Gonzalez, and Ion Stoica. Chatbot arena: an open platform for evaluating llms by human preference. 2024. URL: https://arxiv.org/abs/2403.04132, arXiv:2403.04132.

-
-[Cho24] +
+[Cho24a] +

Francois Chollet. Arc prize 2024 results. ARC Prize Website, 2024. URL: https://arcprize.org/2024-results.

+
+
+[Cho24b]

Francois Chollet. Abstraction and reasoning challenge. ARC Prize Website, 2024. URL: https://arcprize.org/.

-
+
[DGLH24]

Yann Dubois, Balázs Galambosi, Percy Liang, and Tatsunori B. Hashimoto. Length-controlled alpacaeval: a simple way to debias automatic evaluators. 2024. URL: https://arxiv.org/abs/2404.04475, arXiv:2404.04475.

-
-[Fac24a] +
+[Fac24a]

Hugging Face. Available tasks - lighteval wiki. https://github.com/huggingface/lighteval/wiki/Available-Tasks, 2024. Accessed: 2024.

-
-[Fac24b] +
+[Fac24b]

Hugging Face. Evaluate the model on a server or container - lighteval wiki. https://github.com/huggingface/lighteval/wiki/Evaluate-the-model-on-a-server-or-container, 2024. Accessed: 2024.

-
-[Fac24c] +
+[Fac24c]

Hugging Face. Gpt-2 documentation - hugging face transformers. https://huggingface.co/docs/transformers/model_doc/gpt2, 2024. Accessed: 2024.

-
+
[Fac24d]

Hugging Face. Llm as a judge. https://huggingface.co/learn/cookbook/en/llm_judge, 2024. Accessed: 2024.

-
-[Fac24e] +
+[Fac24e]

Hugging Face. Metric list - lighteval wiki. https://github.com/huggingface/lighteval/wiki/Metric-List, 2024. Accessed: 2024.

-
+
[Fac24f]

Hugging Face. Open llm leaderboard. Hugging Face Spaces, 2024. URL: https://huggingface.co/spaces/open-llm-leaderboard/blog.

-
+
[FHWT23] -(1,2) +(1,2)

Clémentine Fourrier, Nathan Habib, Thomas Wolf, and Lewis Tunstall. Lighteval: a lightweight framework for llm evaluation. 2023. URL: https://github.com/huggingface/lighteval.

-
+
[HBB+21]

Dan Hendrycks, Collin Burns, Steven Basart, Andy Zou, Mantas Mazeika, Dawn Song, and Jacob Steinhardt. Measuring massive multitask language understanding. 2021. URL: https://arxiv.org/abs/2009.03300, arXiv:2009.03300.

-
+
[HBD+20]

Ari Holtzman, Jan Buys, Li Du, Maxwell Forbes, and Yejin Choi. The curious case of neural text degeneration. 2020. URL: https://arxiv.org/abs/1904.09751, arXiv:1904.09751.

-
-[HYC+24] +
+[HYC+24]

Binyuan Hui, Jian Yang, Zeyu Cui, Jiaxi Yang, Dayiheng Liu, Lei Zhang, Tianyu Liu, Jiajun Zhang, Bowen Yu, Kai Dang, and others. Qwen2. 5-coder technical report. arXiv preprint arXiv:2409.12186, 2024.

-
+
[LXS+24] (1,2,3)

Zhen Li, Xiaohan Xu, Tao Shen, Can Xu, Jia-Chen Gu, Yuxuan Lai, Chongyang Tao, and Shuai Ma. Leveraging large language models for nlg evaluation: advances and challenges. 2024. URL: https://arxiv.org/abs/2401.07103, arXiv:2401.07103.

-
+
[LBL+23]

Percy Liang, Rishi Bommasani, Tony Lee, Dimitris Tsipras, Dilara Soylu, Michihiro Yasunaga, Yian Zhang, Deepak Narayanan, Yuhuai Wu, Ananya Kumar, Benjamin Newman, Binhang Yuan, Bobby Yan, Ce Zhang, Christian Cosgrove, Christopher D. Manning, Christopher Ré, Diana Acosta-Navas, Drew A. Hudson, Eric Zelikman, Esin Durmus, Faisal Ladhak, Frieda Rong, Hongyu Ren, Huaxiu Yao, Jue Wang, Keshav Santhanam, Laurel Orr, Lucia Zheng, Mert Yuksekgonul, Mirac Suzgun, Nathan Kim, Neel Guha, Niladri Chatterji, Omar Khattab, Peter Henderson, Qian Huang, Ryan Chi, Sang Michael Xie, Shibani Santurkar, Surya Ganguli, Tatsunori Hashimoto, Thomas Icard, Tianyi Zhang, Vishrav Chaudhary, William Wang, Xuechen Li, Yifan Mai, Yuhui Zhang, and Yuta Koreeda. Holistic evaluation of language models. 2023. URL: https://arxiv.org/abs/2211.09110, arXiv:2211.09110.

-
+
[LHE22]

Stephanie Lin, Jacob Hilton, and Owain Evans. Truthfulqa: measuring how models mimic human falsehoods. 2022. URL: https://arxiv.org/abs/2109.07958, arXiv:2109.07958.

-
+
[SRR+23]

Aarohi Srivastava, Abhinav Rastogi, Abhishek Rao, Abu Awal Md Shoeb, Abubakar Abid, Adam Fisch, Adam R. Brown, Adam Santoro, Aditya Gupta, Adrià Garriga-Alonso, Agnieszka Kluska, Aitor Lewkowycz, Akshat Agarwal, Alethea Power, Alex Ray, Alex Warstadt, Alexander W. Kocurek, Ali Safaya, Ali Tazarv, Alice Xiang, Alicia Parrish, Allen Nie, Aman Hussain, Amanda Askell, Amanda Dsouza, Ambrose Slone, Ameet Rahane, Anantharaman S. Iyer, Anders Andreassen, Andrea Madotto, Andrea Santilli, Andreas Stuhlmüller, Andrew Dai, Andrew La, Andrew Lampinen, Andy Zou, Angela Jiang, Angelica Chen, Anh Vuong, Animesh Gupta, Anna Gottardi, Antonio Norelli, Anu Venkatesh, Arash Gholamidavoodi, Arfa Tabassum, Arul Menezes, Arun Kirubarajan, Asher Mullokandov, Ashish Sabharwal, Austin Herrick, Avia Efrat, Aykut Erdem, Ayla Karakaş, B. Ryan Roberts, Bao Sheng Loe, Barret Zoph, Bartłomiej Bojanowski, Batuhan Özyurt, Behnam Hedayatnia, Behnam Neyshabur, Benjamin Inden, Benno Stein, Berk Ekmekci, Bill Yuchen Lin, Blake Howald, Bryan Orinion, Cameron Diao, Cameron Dour, Catherine Stinson, Cedrick Argueta, César Ferri Ramírez, Chandan Singh, Charles Rathkopf, Chenlin Meng, Chitta Baral, Chiyu Wu, Chris Callison-Burch, Chris Waites, Christian Voigt, Christopher D. Manning, Christopher Potts, Cindy Ramirez, Clara E. Rivera, Clemencia Siro, Colin Raffel, Courtney Ashcraft, Cristina Garbacea, Damien Sileo, Dan Garrette, Dan Hendrycks, Dan Kilman, Dan Roth, Daniel Freeman, Daniel Khashabi, Daniel Levy, Daniel Moseguí González, Danielle Perszyk, Danny Hernandez, Danqi Chen, Daphne Ippolito, Dar Gilboa, David Dohan, David Drakard, David Jurgens, Debajyoti Datta, Deep Ganguli, Denis Emelin, Denis Kleyko, Deniz Yuret, Derek Chen, Derek Tam, Dieuwke Hupkes, Diganta Misra, Dilyar Buzan, Dimitri Coelho Mollo, Diyi Yang, Dong-Ho Lee, Dylan Schrader, Ekaterina Shutova, Ekin Dogus Cubuk, Elad Segal, Eleanor Hagerman, Elizabeth Barnes, Elizabeth Donoway, Ellie Pavlick, Emanuele Rodola, Emma Lam, Eric Chu, Eric Tang, Erkut Erdem, Ernie Chang, Ethan A. Chi, Ethan Dyer, Ethan Jerzak, Ethan Kim, Eunice Engefu Manyasi, Evgenii Zheltonozhskii, Fanyue Xia, Fatemeh Siar, Fernando Martínez-Plumed, Francesca Happé, Francois Chollet, Frieda Rong, Gaurav Mishra, Genta Indra Winata, Gerard de Melo, Germán Kruszewski, Giambattista Parascandolo, Giorgio Mariani, Gloria Wang, Gonzalo Jaimovitch-López, Gregor Betz, Guy Gur-Ari, Hana Galijasevic, Hannah Kim, Hannah Rashkin, Hannaneh Hajishirzi, Harsh Mehta, Hayden Bogar, Henry Shevlin, Hinrich Schütze, Hiromu Yakura, Hongming Zhang, Hugh Mee Wong, Ian Ng, Isaac Noble, Jaap Jumelet, Jack Geissinger, Jackson Kernion, Jacob Hilton, Jaehoon Lee, Jaime Fernández Fisac, James B. Simon, James Koppel, James Zheng, James Zou, Jan Kocoń, Jana Thompson, Janelle Wingfield, Jared Kaplan, Jarema Radom, Jascha Sohl-Dickstein, Jason Phang, Jason Wei, Jason Yosinski, Jekaterina Novikova, Jelle Bosscher, Jennifer Marsh, Jeremy Kim, Jeroen Taal, Jesse Engel, Jesujoba Alabi, Jiacheng Xu, Jiaming Song, Jillian Tang, Joan Waweru, John Burden, John Miller, John U. Balis, Jonathan Batchelder, Jonathan Berant, Jörg Frohberg, Jos Rozen, Jose Hernandez-Orallo, Joseph Boudeman, Joseph Guerr, Joseph Jones, Joshua B. Tenenbaum, Joshua S. Rule, Joyce Chua, Kamil Kanclerz, Karen Livescu, Karl Krauth, Karthik Gopalakrishnan, Katerina Ignatyeva, Katja Markert, Kaustubh D. Dhole, Kevin Gimpel, Kevin Omondi, Kory Mathewson, Kristen Chiafullo, Ksenia Shkaruta, Kumar Shridhar, Kyle McDonell, Kyle Richardson, Laria Reynolds, Leo Gao, Li Zhang, Liam Dugan, Lianhui Qin, Lidia Contreras-Ochando, Louis-Philippe Morency, Luca Moschella, Lucas Lam, Lucy Noble, Ludwig Schmidt, Luheng He, Luis Oliveros Colón, Luke Metz, Lütfi Kerem Şenel, Maarten Bosma, Maarten Sap, Maartje ter Hoeve, Maheen Farooqi, Manaal Faruqui, Mantas Mazeika, Marco Baturan, Marco Marelli, Marco Maru, Maria Jose Ramírez Quintana, Marie Tolkiehn, Mario Giulianelli, Martha Lewis, Martin Potthast, Matthew L. Leavitt, Matthias Hagen, Mátyás Schubert, Medina Orduna Baitemirova, Melody Arnaud, Melvin McElrath, Michael A. Yee, Michael Cohen, Michael Gu, Michael Ivanitskiy, Michael Starritt, Michael Strube, Michał Swędrowski, Michele Bevilacqua, Michihiro Yasunaga, Mihir Kale, Mike Cain, Mimee Xu, Mirac Suzgun, Mitch Walker, Mo Tiwari, Mohit Bansal, Moin Aminnaseri, Mor Geva, Mozhdeh Gheini, Mukund Varma T, Nanyun Peng, Nathan A. Chi, Nayeon Lee, Neta Gur-Ari Krakover, Nicholas Cameron, Nicholas Roberts, Nick Doiron, Nicole Martinez, Nikita Nangia, Niklas Deckers, Niklas Muennighoff, Nitish Shirish Keskar, Niveditha S. Iyer, Noah Constant, Noah Fiedel, Nuan Wen, Oliver Zhang, Omar Agha, Omar Elbaghdadi, Omer Levy, Owain Evans, Pablo Antonio Moreno Casares, Parth Doshi, Pascale Fung, Paul Pu Liang, Paul Vicol, Pegah Alipoormolabashi, Peiyuan Liao, Percy Liang, Peter Chang, Peter Eckersley, Phu Mon Htut, Pinyu Hwang, Piotr Miłkowski, Piyush Patil, Pouya Pezeshkpour, Priti Oli, Qiaozhu Mei, Qing Lyu, Qinlang Chen, Rabin Banjade, Rachel Etta Rudolph, Raefer Gabriel, Rahel Habacker, Ramon Risco, Raphaël Millière, Rhythm Garg, Richard Barnes, Rif A. Saurous, Riku Arakawa, Robbe Raymaekers, Robert Frank, Rohan Sikand, Roman Novak, Roman Sitelew, Ronan LeBras, Rosanne Liu, Rowan Jacobs, Rui Zhang, Ruslan Salakhutdinov, Ryan Chi, Ryan Lee, Ryan Stovall, Ryan Teehan, Rylan Yang, Sahib Singh, Saif M. Mohammad, Sajant Anand, Sam Dillavou, Sam Shleifer, Sam Wiseman, Samuel Gruetter, Samuel R. Bowman, Samuel S. Schoenholz, Sanghyun Han, Sanjeev Kwatra, Sarah A. Rous, Sarik Ghazarian, Sayan Ghosh, Sean Casey, Sebastian Bischoff, Sebastian Gehrmann, Sebastian Schuster, Sepideh Sadeghi, Shadi Hamdan, Sharon Zhou, Shashank Srivastava, Sherry Shi, Shikhar Singh, Shima Asaadi, Shixiang Shane Gu, Shubh Pachchigar, Shubham Toshniwal, Shyam Upadhyay, Shyamolima, Debnath, Siamak Shakeri, Simon Thormeyer, Simone Melzi, Siva Reddy, Sneha Priscilla Makini, Soo-Hwan Lee, Spencer Torene, Sriharsha Hatwar, Stanislas Dehaene, Stefan Divic, Stefano Ermon, Stella Biderman, Stephanie Lin, Stephen Prasad, Steven T. Piantadosi, Stuart M. Shieber, Summer Misherghi, Svetlana Kiritchenko, Swaroop Mishra, Tal Linzen, Tal Schuster, Tao Li, Tao Yu, Tariq Ali, Tatsu Hashimoto, Te-Lin Wu, Théo Desbordes, Theodore Rothschild, Thomas Phan, Tianle Wang, Tiberius Nkinyili, Timo Schick, Timofei Kornev, Titus Tunduny, Tobias Gerstenberg, Trenton Chang, Trishala Neeraj, Tushar Khot, Tyler Shultz, Uri Shaham, Vedant Misra, Vera Demberg, Victoria Nyamai, Vikas Raunak, Vinay Ramasesh, Vinay Uday Prabhu, Vishakh Padmakumar, Vivek Srikumar, William Fedus, William Saunders, William Zhang, Wout Vossen, Xiang Ren, Xiaoyu Tong, Xinran Zhao, Xinyi Wu, Xudong Shen, Yadollah Yaghoobzadeh, Yair Lakretz, Yangqiu Song, Yasaman Bahri, Yejin Choi, Yichi Yang, Yiding Hao, Yifu Chen, Yonatan Belinkov, Yu Hou, Yufang Hou, Yuntao Bai, Zachary Seid, Zhuoye Zhao, Zijian Wang, Zijie J. Wang, Zirui Wang, and Ziyi Wu. Beyond the imitation game: quantifying and extrapolating the capabilities of language models. 2023. URL: https://arxiv.org/abs/2206.04615, arXiv:2206.04615.

-
+
[WPN+19]

Alex Wang, Yada Pruksachatkun, Nikita Nangia, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. Superglue: a stickier benchmark for general-purpose language understanding systems. Advances in Neural Information Processing Systems, 2019.

-
+
[WSM+19]

Alex Wang, Amanpreet Singh, Julian Michael, Felix Hill, Omer Levy, and Samuel R. Bowman. Glue: a multi-task benchmark and analysis platform for natural language understanding. 2019. URL: https://arxiv.org/abs/1804.07461, arXiv:1804.07461.

-
+
[WTB+22]

Jason Wei, Yi Tay, Rishi Bommasani, Colin Raffel, Barret Zoph, Sebastian Borgeaud, Dani Yogatama, Maarten Bosma, Denny Zhou, Donald Metzler, Ed H. Chi, Tatsunori Hashimoto, Oriol Vinyals, Percy Liang, Jeff Dean, and William Fedus. Emergent abilities of large language models. 2022. URL: https://arxiv.org/abs/2206.07682, arXiv:2206.07682.

-
+
[WDR+24]

Colin White, Samuel Dooley, Manley Roberts, Arka Pal, Ben Feuer, Siddhartha Jain, Ravid Shwartz-Ziv, Neel Jain, Khalid Saifullah, Siddartha Naidu, Chinmay Hegde, Yann LeCun, Tom Goldstein, Willie Neiswanger, and Micah Goldblum. Livebench: a challenging, contamination-free llm benchmark. 2024. URL: https://arxiv.org/abs/2406.19314, arXiv:2406.19314.

-
-[YYH+24] +
+[YYH+24]

An Yang, Baosong Yang, Binyuan Hui, Bo Zheng, Bowen Yu, Chang Zhou, Chengpeng Li, Chengyuan Li, Dayiheng Liu, Fei Huang, Guanting Dong, Haoran Wei, Huan Lin, Jialong Tang, Jialin Wang, Jian Yang, Jianhong Tu, Jianwei Zhang, Jianxin Ma, Jin Xu, Jingren Zhou, Jinze Bai, Jinzheng He, Junyang Lin, Kai Dang, Keming Lu, Keqin Chen, Kexin Yang, Mei Li, Mingfeng Xue, Na Ni, Pei Zhang, Peng Wang, Ru Peng, Rui Men, Ruize Gao, Runji Lin, Shijie Wang, Shuai Bai, Sinan Tan, Tianhang Zhu, Tianhao Li, Tianyu Liu, Wenbin Ge, Xiaodong Deng, Xiaohuan Zhou, Xingzhang Ren, Xinyu Zhang, Xipin Wei, Xuancheng Ren, Yang Fan, Yang Yao, Yichang Zhang, Yu Wan, Yunfei Chu, Yuqiong Liu, Zeyu Cui, Zhenru Zhang, and Zhihao Fan. Qwen2 technical report. arXiv preprint arXiv:2407.10671, 2024.

-
+
[ZCS+23]

Lianmin Zheng, Wei-Lin Chiang, Ying Sheng, Siyuan Zhuang, Zhanghao Wu, Yonghao Zhuang, Zi Lin, Zhuohan Li, Dacheng Li, Eric P. Xing, Hao Zhang, Joseph E. Gonzalez, and Ion Stoica. Judging llm-as-a-judge with mt-bench and chatbot arena. 2023. URL: https://arxiv.org/abs/2306.05685, arXiv:2306.05685.

-
-[HuggingFace24] +
+[HuggingFace24]

Hugging Face. Number of models on hugging face. https://huggingface.co/spaces/huggingface/open-source-ai-year-in-review-2024?day=4, 2024. Accessed: 12/06/2024.

-
-[MetaAI24] +
+[MetaAI24]

Meta AI. Meta llama models on hugging face. https://huggingface.co/meta-llama, 2024. Accessed: 2024.

-
-[PromptFoo24] +
+[PromptFoo24]

PromptFoo. Promptfoo - open-source prompt engineering toolkit. https://www.promptfoo.dev/, 2024. Accessed: 12/06/2024.

diff --git a/tamingllms/_build/html/notebooks/output_size_limit.html b/tamingllms/_build/html/notebooks/output_size_limit.html index 0930228..c910928 100644 --- a/tamingllms/_build/html/notebooks/output_size_limit.html +++ b/tamingllms/_build/html/notebooks/output_size_limit.html @@ -194,7 +194,7 @@
-

2. Output Size Limitations

+

2. Output Size Limitations

Only those who will risk going too far can possibly find out how far one can go.

—T.S. Eliot

@@ -202,34 +202,34 @@

Contents

-

2.1. What are Token Limits?

+

2.1. What are Token Limits?

Tokens are the basic units that LLMs process text with. A token can be as short as a single character or as long as a complete word. In English, a general rule of thumb is that 1 token ≈ 4 characters or ¾ of a word.

The max_output_tokens is parameter often available in modern LLMs that determines the maximum length of text that an LLM can generate in a single response. Table 2.1 shows the max_output_tokens for several key models, which typically range between 4096 and 16384 tokens. Contrary to what one might expect, the model does not “summarizes the answer” such that it does not surpass max_output_tokens limit. Instead, it will stop once it reaches this limit, even mid-sentence, i.e. the response may be truncated.

@@ -289,7 +289,7 @@

-

2.2. Problem Statement

+

2.2. Problem Statement

The max_output_tokens limit in LLMs poses a significant challenge for users who need to generate long outputs, as it may result in truncated content and/or incomplete information.

  1. Truncated Content: Users aiming to generate extensive content, such as detailed reports or comprehensive articles, may find their outputs abruptly cut off due to the max_output_tokens limit. This truncation can result in incomplete information and disrupt the flow of the content.

  2. @@ -298,7 +298,7 @@

    -

    2.3. Content Chunking with Contextual Linking

    +

    2.3. Content Chunking with Contextual Linking

    Content chunking with contextual linking is a technique used to manage the max_output_tokens limitation by breaking down long-form content into smaller, manageable chunks. This approach allows the LLM to focus on smaller sections of the input, enabling it to generate more complete and detailed responses for each chunk while maintaining coherence and context across the entire output.

    1. Chunking the Content: The input content is split into smaller chunks. This allows the LLM to process each chunk individually, focusing on generating a complete and detailed response for that specific section of the input.

    2. @@ -309,7 +309,7 @@

      max_output_tokens limitation and generate coherent long-form content without truncation.

      Let’s examine an example implementation of this technique.

      -

      2.3.1. Generating long-form content

      +

      2.3.1. Generating long-form content

      • Goal: Generate a long-form report analyzing a company’s financial statement.

      • Input: A company’s 10K SEC filing.

      • @@ -322,7 +322,7 @@

        Fig. 2.1 illustrates the process we will follow for handling long-form content generation with Large Language Models through “Content Chunking with Contextual Linking.” It shows how input content is first split into manageable chunks using a chunking function (e.g. CharacterTextSplitter with tiktoken tokenizer), then each chunk is processed sequentially while maintaining context from previous chunks. For each chunk, the system updates the context, generates a dynamic prompt with specific parameters, makes a call to the LLM chain, and stores the response. After all chunks are processed, the individual responses are combined with newlines to create the final report, effectively working around the token limit constraints of LLMs while maintaining coherence across the generated content.

        -

        2.3.1.1. Step 1: Chunking the Content

        +

        2.3.1.1. Step 1: Chunking the Content

        There are different methods for chunking, and each of them might be appropriate for different situations. However, we can broadly group chunking strategies in two types:

        • Fixed-size Chunking: This is the most common and straightforward approach to chunking. We simply decide the number of tokens in our chunk and, optionally, whether there should be any overlap between them. In general, we will want to keep some overlap between chunks to make sure that the semantic context doesn’t get lost between chunks. Fixed-sized chunking may be a reasonable path in many common cases. Compared to other forms of chunking, fixed-sized chunking is computationally cheap and simple to use since it doesn’t require the use of any specialied techniques or libraries.

        • @@ -359,7 +359,7 @@

          -

          2.3.1.2. Step 2: Writing the Base Prompt Template

          +

          2.3.1.2. Step 2: Writing the Base Prompt Template

          We will write a base prompt template which will serve as a foundational structure for all chunks, ensuring consistency in the instructions and context provided to the language model. The template includes the following parameters:

          • role: Defines the role or persona the model should assume.

          • @@ -426,7 +426,7 @@

            -

            2.3.1.3. Step 3: Constructing Dynamic Prompt Parameters

            +

            2.3.1.3. Step 3: Constructing Dynamic Prompt Parameters

            Now, we will write a function (get_dynamic_prompt_template) that constructs prompt parameters dynamically for each chunk.

            @@ -479,7 +479,7 @@

            -

            2.3.1.4. Step 4: Generating the Report

            +

            2.3.1.4. Step 4: Generating the Report

            Finally, we will write a function that generates the actual report by calling the LLMChain with the dynamically updated prompt parameters for each chunk and concatenating the results at the end.

            @@ -538,7 +538,7 @@

            -

            2.3.1.5. Example Usage

            +

            2.3.1.5. Example Usage

            # Load the text from sample 10K SEC filing
            @@ -606,7 +606,7 @@ 

            -

            2.3.2. Discussion

            +

            2.3.2. Discussion

            Results from the generated report present a few interesting aspects:

            • Coherence: The generated report demonstrates a high level of coherence. The sections are logically structured, and the flow of information is smooth. Each part of the report builds upon the previous sections, providing a comprehensive analysis of Apple Inc.’s financial performance and key risk factors. The use of headings and subheadings helps in maintaining clarity and organization throughout the document.

            • @@ -620,7 +620,7 @@

              -

              2.4. Implications

              +

              2.4. Implications

              Implementing context chunking with contextual linking is a practical solution to manage the output size limitations of LLMs. However, this approach comes with its own set of implications that developers must consider.

              1. Increased Development Complexity: Implementing strategies to overcome the maximum output token length introduces additional layers of complexity to the application design. It necessitates meticulous management of context across multiple outputs to maintain coherence. Ensuring that each chunk retains the necessary context for the conversation or document can be challenging and often requires advanced logic to handle transitions seamlessly.

              2. @@ -630,7 +630,7 @@

                -

                2.5. Future Considerations

                +

                2.5. Future Considerations

                As models evolve, we can expect several advancements that will significantly impact how we handle output size limitations:

                1. Contextual Awareness: Future LLMs will likely have improved contextual awareness - or as Mustafa Suleyman would call “infinite memory”, enabling them to better understand and manage the context of a conversation or document over long interactions. This will reduce the need for repetitive context setting and improve the overall user experience.

                2. @@ -642,11 +642,11 @@

                  -

                  2.6. Conclusion

                  +

                  2.6. Conclusion

                  In conclusion, while managing output size limitations in LLMs presents significant challenges, it also drives innovation in application design and optimization strategies. By implementing techniques such as context chunking, efficient prompt templates, and graceful fallbacks, developers can mitigate these limitations and enhance the performance and cost-effectiveness of their applications. As the technology evolves, advancements in contextual awareness, token efficiency, and memory management will further empower developers to build more robust and scalable LLM-powered systems. It is crucial to stay informed about these developments and continuously adapt to leverage the full potential of LLMs while addressing their inherent constraints.

        -

        2.7. References

        +

        2.7. References

        [LangChain24] diff --git a/tamingllms/_build/html/notebooks/structured_output.html b/tamingllms/_build/html/notebooks/structured_output.html index dabeed3..ac85e3f 100644 --- a/tamingllms/_build/html/notebooks/structured_output.html +++ b/tamingllms/_build/html/notebooks/structured_output.html @@ -196,7 +196,7 @@
        -

        3. Wrestling with Structured Output

        +

        3. Wrestling with Structured Output

        In limits, there is freedom. Creativity thrives within structure.

        —Julia B. Cameron

        @@ -204,42 +204,42 @@

        Contents

        -

        3.1. Introduction

        +

        3.1. Introduction

        Large language models (LLMs) excel at generating human-like text, but they often struggle to produce output in a structured format consistently. This poses a significant challenge when we need LLMs to generate data that can be easily processed by other systems, such as databases, APIs, or other software applications. Sometimes, even with a well-crafted prompt, an LLM might produce an unstructured response when a structured one is expected. This can be particularly challenging when integrating LLMs into systems that require specific data formats.

        As a motivating example, consider the following simple task: Given a segment of a SEC financial filing, generate a two-person discussion about the key financial data from the text in JSON format, simulating what would be a real-world discussion about the underlying companies’ disclosed financial information. We would like to generate a structured output that can be easily parsed and integrated with other systems.

        Throughout this notebook, we will consider as input a segment of a sample SEC filing of Apple Inc.

        @@ -345,7 +345,7 @@

        -

        3.2. Problem Statement

        +

        3.2. Problem Statement

        Obtaining structured output from LLMs presents several significant challenges:

        • Inconsistency: LLMs often produce unpredictable results, sometimes generating well-structured output and other times deviating from the expected format.

        • @@ -354,7 +354,7 @@

          -

          3.3. User Needs

          +

          3.3. User Needs

          What user needs drive the demand for LLM output constraints when building LLM-based applications? In a recent work by Google Research [Liu et al., 2024], the authors explore the user need for constraints on the output of large language models, drawing on a survey of 51 industry professionals who use LLMs in their work. These needs can be broadly categorized as follows:

          1. Improving Developer Efficiency and Workflow

            @@ -377,10 +377,10 @@

            -

            3.4. Solutions

            +

            3.4. Solutions

            Several strategies and tools can be employed to address the challenges of structured output from LLMs.

            -

            3.4.1. Strategies

            +

            3.4.1. Strategies

            • Schema Guidance: Providing the LLM with a clear schema or blueprint of the desired output structure helps to constrain its generation and improve consistency. This can be achieved by using tools like Pydantic to define the expected data structure and then using that definition to guide the LLM’s output.

            • Output Parsing: When LLMs don’t natively support structured output, parsing their text output using techniques like regular expressions or dedicated parsing libraries can extract the desired information. For example, you can use regular expressions to extract specific patterns from the LLM’s output, or you can use libraries like Pydantic to parse the output into structured data objects.

            • @@ -388,9 +388,9 @@

              -

              3.4.2. Techniques and Tools

              +

              3.4.2. Techniques and Tools

              -

              3.4.2.1. One-Shot Prompts

              +

              3.4.2.1. One-Shot Prompts

              In one-shot prompting, you provide a single example of the desired output format within the prompt.

              @@ -457,7 +457,7 @@

              -

              3.4.2.2. Structured Output with Provider-Specific APIs

              +

              3.4.2.2. Structured Output with Provider-Specific APIs

              One-shot prompting is a simple technique that can lead to material improvements in structured output, though may not be sufficient for complex (e.g. nested) structures and / or when the model’s output needs to be restricted to a specific set of options or types.

              Provider-specific APIs can offer ways to handle those challenges. We will explore two approaches here using OpenAI’s API:

                @@ -466,7 +466,7 @@

                -

                3.4.2.3. JSON Mode

                +

                3.4.2.3. JSON Mode

                JSON mode is a feature provided by most LLM API providers, such as OpenAI, that allows the model to generate output in JSON format. This is particularly useful when you need structured data as a result, such as when parsing the output programmatically or integrating it with other systems that require JSON input. As depicted in Fig. 3.1, JSON mode is implemented by instructing theLLM model to use JSON as response format and optionally defining a target schema.

                JSON Mode @@ -604,7 +604,7 @@

                -

                3.4.3. LangChain

                +

                3.4.3. LangChain

                LangChain is a framework designed to simplify the development of LLM applications. It provider an abstraction layer over many LLM providers, including OpenAI, that offers several tools for parsing structured output.

                In particular, LangChain offers the with_structured_output method, which can be used with LLMs that support structured output APIs, allowing you to enforce a schema directly within the prompt.

                @@ -664,7 +664,7 @@

                .with_structured_output() can be found here.

              -

              3.4.4. Outlines

              +

              3.4.4. Outlines

              Outlines [Outlines, 2024] is a library specifically focused on structured text generation from LLMs. Under the hood, Outlines works by adjusting the probability distribution of the model’s output logits - the raw scores from the final layer of the neural network that are normally converted into text tokens. By introducing carefully crafted logit biases, Outlines can guide the model to prefer certain tokens over others, effectively constraining its outputs to a predefined set of valid options. This provides fine-grained control over the model’s generation process. In that way, Outlines provides several powerful features:

              • Multiple Choice Generation: Restrict the LLM output to a predefined set of options.

              • @@ -743,7 +743,7 @@

                -

                3.4.5. Ollama

                +

                3.4.5. Ollama

                Ollama is a popular tool that allows you to run large language models (LLMs) locally. It has recently added support for structured output generation. The current ollama implementation leverages llama.cpp GBNF (GGML BNF) grammars [Ggerganov, 2024] to enable structured output generation. llama.cpp GBNF forces language models to generate output in specific, predefined formats by constraining their outputs to follow precise rules and patterns. The system accomplishes this through a formal grammar specification that defines exactly how valid outputs can be constructed. It’s essentially an extension of BNF (Backus-Naur Form) [Wikipedia contributors, 2024] with some modern regex-like features added. These rules carefully define what elements are allowed, how they can be combined, and what patterns of repetition and sequencing are valid. By enforcing these constraints during generation, GBNF ensures the model’s output strictly adheres to the desired format.

                Ollama first introduced structured output generation in version 0.5.1 providing support for JSON output but highlighting additional formats are coming soon.

                Let’s replicate our previous structured output generation example with Ollama. First, make sure you have Ollama installed. You can find installation instructions here.

                @@ -840,9 +840,9 @@

                -

                3.5. Discussion

                +

                3.5. Discussion

                -

                3.5.1. Comparing Solutions

                +

                3.5.1. Comparing Solutions

                The choice of framework for structured LLM output depends heavily on specific constraints, requirements and use cases. LangChain is the most used LLM framework today with a large developer community base however its structured output support depends on the underlying LLM provider support. Ollama enables straightforward local deployment and experimentation democratizing access to LLMs while fostering privacy and control, however today it only offers JSON format with further formats to come. Outlines emerges as a solution with great flexibility and control over output structure while providing support for a wide range of LLMs. Table 3.1 provides a summary comparison of the different frameworks.

@@ -888,7 +888,7 @@

-

3.5.2. Best Practices

+

3.5.2. Best Practices

  • Clear Schema Definition: Define the desired output structure clearly. This can be done in several ways including schemas, types, or Pydantic models as appropriate. This ensures the LLM knows exactly what format is expected.

  • Descriptive Naming: Use meaningful names for fields and elements in your schema. This makes the output more understandable and easier to work with.

  • @@ -897,7 +897,7 @@

    -

    3.5.3. Research and Ongoing Debate

    +

    3.5.3. Research and Ongoing Debate

    The use of structured output for Large Language Models (LLMs) is a developing area. While the ability to constrain LLM outputs offer clear benefits in parsing, robustness, and integration, there is growing debate on whether it also potentially comes at the cost of performance as well as reasoning abilities. Research in this area should be taken with a grain of salt since findings are mixed and often depend on the specific task and model family at hand furthermore model families are not always comparable and are getting updated by the day! Nonetheless, early findings provide some interesting insights as to why there is no one-size-fits-all solution when it comes to LLMs structured output.

    There is some evidence indicating that LLMs may have bias in their handling of different output formats [Long et al., 2024]. The study examined common output structures like multiple-choice answers, wrapped text, lists, and key-value mappings. The authors analyzed key LLM model families, namely Gemma, Mistral, and ChatGPT, uncovering bias across multiple tasks and formats. The researchers attributed these biases to the models’ underlying token distributions for different formats. An example of this format bias emerged in the comparison between JSON and YAML outputs. While models like Mistral and Gemma excelled at generating JSON structures, they performed notably worse with YAML. Their YAML outputs often contained extraneous information that degrades output quality. This disparity likely stems from JSON’s prevalence in training data, highlighting how a format’s popularity directly influences model performance. While the studied models can be probably considered outdated by now since models are getting updated on a rapidly fashion, it is important to remark that addressing format bias is critical for advancing LLMs and ensuring their reliable application in real-world scenarios.

    Recent research “Let Me Speak Freely? A Study on the Impact of Format Restrictions on Performance of Large Language Models” [Tam et al., 2024] suggests that imposing format restrictions on LLMs might impact their performance, particularly in reasoning-intensive tasks. Further evidence [Aider, 2024] suggests LLMs may produce lower quality code if they’re asked to return it as part of a structured JSON response, in particular:

    @@ -927,15 +927,15 @@

    -

    3.6. Conclusion

    +

    3.6. Conclusion

    Extracting structured output from LLMs is crucial for integrating them into real-world applications. By understanding the challenges and employing appropriate strategies and tools, developers can improve the reliability and usability of LLM-powered systems, unlocking their potential to automate complex tasks and generate valuable insights.

    -

    3.7. Acknowledgements

    +

    3.7. Acknowledgements

    We would like to thank Cameron Pfiffer from the .txt team for his insightful review and feedback.

    -

    3.8. References

    +

    3.8. References

    [Aid24] diff --git a/tamingllms/_build/html/objects.inv b/tamingllms/_build/html/objects.inv index 760c45a218ab301b3c07332c8a47d8b49e239d9a..022472216732a79d57dfa5d8a4e66b07cd8b1d0f 100644 GIT binary patch delta 912 zcmV;B18@BN2mJ?-dw+A&Fc7}`SL_TuqCf*DZkh=z1+^q+RFj`>YORjohTp!iV2JJ@T7R(YxqcK}>hbi!)P=8WzFO7y^S;o({AC{NO z%a?c?2Y_&tPPguFzaF(`A}jS;>J{5cpRp+2#b}g_@^kT+(if$lJNX;uO6Jm6vB~r@ zL>Hjpx`u`TT}dM}jB4nq$S|jP1Wd57=*Xy!4s*1t5LIDKf!0fOJrX0+cW@IdgLl4= zN=8Gil!2*P`YpEz8yAY@67nc@in~+DMjEl;E_^0ZduvYBa&FhD+3GiOuW@`A z7HKJh%jBmsnuTMZi`jWgs-?wrUX`{<>F`{)eVN{Wi+`CMBx~HW9KDso^~%Xt+{3um zUm6>86dIFy;ubYp`rKd~M3*SJ%6zcy)d0iy)Lmo{5aIjYXEbipk6)V^Zyn{nKFT0b zDxd!>(^a2(k}j)spO|>=7-wIiR(s@5%M_iZMw}_&Y55Kc-iS^^l2>Lko`*EXrlOMY z7qXWNb${ZA0>xvHVAu=xbD49^q-Hc^Oc_&jfaR)WNVH>~(?8?!=mBopR-sqmBlkpl z9$YMT5~rLfEibs0)c^gUgAowtP+wB?0-*&!zsIi0=lV6c%xpvAxe%r0Vo#)Eh|*|n zjdPI7V@@{#NJaJ!iBXwUZa394mB{7|c#MOL96zhw376X?A4@ml0=80%BO~vtg@L_``_t;NghQwUs|5y*4VXs zCVzP!#jFyu7Sy9N!l&cr^`Yn6@oe0@)W`MzN3(J41gI4YU5F3utqD4M)i_w}Vc>Tj zJo0>yc(X=LqwW0DH}RQQ$_2~<03Fl$ID{kAE2{gkwOSc~t7?Sfr4=6z3H7&3qaL&P mtCW@=4chmcRw7lnYCQ&L)On%tCT*ZGCkbxZC4T_DuJgbJ&C2Bf delta 911 zcmV;A191HP2mA++dw-AGFc7}ySB%siYucr^-q60fEp)4eRgcIx1M%Y6sqMhhU*EBl zK;km$xG-kz-s6TVH^_Xh-n@k=&kKbxdbnhx z+dG(Xw;m+IXn}<+c~CbSv>S8SY9eG z|Hs=T0EDA-x^;j0^{71)S*h1juh>@lj78~lj7G^QKNXKDeNpLSs@-+@eNHpBs#W=n^GYnGe>z8esU5x{C|~B7EQZjK*zx^|hJt)=}>3qYM(I z^7&JluKLuIbXlc)X5zVHoPCK}?U6e!Q*@RZaiW07<=ZQGBRUO9UYX5!9?}?_ib}%& zlf7K16Mx?qC?10Z!(OnT%baT_HKQS8%9x@9ELSB%q8;;^{+Ue1k8s_#3cUhX+!N_} zaIx4)oN}VHyx>++|M!CqMnIfHeM!*^gcbn(Ub-fq>v!NXvki&oLX?(^9g&J5N~5(k z&Os`dOSFNPVpI@#vmzjFMJ7GQ*>i(B=p|KA^?xJ1CghvOq)so9|D%ho(9A=??jV_B z$C{0*g$S>tS*hse51;Mk>k0>E#HSVI-SQ$ib6<2LbzduRAH^^wF`b9R)s2>%;jeD| zZkmH1lzXrCchz*SReMfaN%~9nYSKAMqQg+-W5Y^T*-Web?{vW=FCv^TEzfal?ApDO zynhd3R*6{)>ctu1=CFCW@A-B-8#gcYVg2vXY#ciRYQ;hq;$wU30Uf<+94vM)@XsDR z@_dkZvqnv$?fm(h_{1yaEzAM{9n<+Zgk#hzs{5g}S{Z;*HNx@IiVuf``dg+^k6HX# lN=uIh?fXqDkt&Q@kHHyrUTD0BHqe9Jt%S8YH diff --git a/tamingllms/_build/html/searchindex.js b/tamingllms/_build/html/searchindex.js index c1cc756..718e7bf 100644 --- a/tamingllms/_build/html/searchindex.js +++ b/tamingllms/_build/html/searchindex.js @@ -1 +1 @@ -Search.setIndex({"docnames": ["markdown/intro", "markdown/toc", "notebooks/evals", "notebooks/output_size_limit", "notebooks/structured_output"], "filenames": ["markdown/intro.md", "markdown/toc.md", "notebooks/evals.ipynb", "notebooks/output_size_limit.ipynb", "notebooks/structured_output.ipynb"], "titles": ["1. Introduction", "Taming LLMs", "4. The Evals Gap", "2. Output Size Limitations", "3. Wrestling with Structured Output"], "terms": {"am": 0, "alwai": [0, 2, 4], "do": [0, 2, 3, 4], "which": [0, 2, 3, 4], "cannot": [0, 2], "order": [0, 2, 4], "mai": [0, 2, 3, 4], "learn": [0, 2], "how": [0, 2, 3, 4], "pablo": [0, 2], "picasso": 0, "In": [0, 2, 3, 4], "recent": [0, 2, 4], "year": [0, 2, 3, 4], "larg": [0, 1, 2, 3, 4], "languag": [0, 1, 2, 3, 4], "model": [0, 1, 4], "llm": [0, 3, 4], "have": [0, 2, 3, 4], "emerg": [0, 1, 4], "transform": [0, 2, 4], "forc": [0, 2, 4], "technologi": [0, 2, 3, 4], "promis": 0, "revolution": 0, "build": [0, 1, 2, 3, 4], "product": [0, 1, 2, 4], "interact": [0, 2, 3, 4], "comput": [0, 2, 3, 4], "from": [0, 2, 3, 4], "chatgpt": [0, 4], "github": [0, 2, 4], "copilot": 0, "claud": [0, 2, 3], "artifact": 0, "system": [0, 2, 3, 4], "captur": [0, 2], "public": [0, 2], "imagin": 0, "spark": 0, "gold": [0, 2], "rush": 0, "ai": [0, 2, 4], "power": [0, 1, 2, 3, 4], "applic": [0, 1, 3, 4], "howev": [0, 2, 3, 4], "beneath": 0, "surfac": [0, 2], "technolog": 0, "revolut": 0, "li": [0, 2], "complex": [0, 2, 3, 4], "landscap": [0, 2], "practition": [0, 2], "must": [0, 2, 3], "navig": [0, 1], "focus": [0, 2, 3, 4], "bring": 0, "awar": [0, 2, 3], "limit": [0, 2, 4], "har": [0, 1, 3], "open": [0, 2, 3, 4], "sourc": [0, 2, 4], "solut": [0, 1, 2, 3], "overcom": [0, 3], "them": [0, 2, 3, 4], "robust": [0, 2, 3, 4], "It": [0, 2, 3, 4], "offer": [0, 2, 3, 4], "critic": [0, 1, 2, 3, 4], "implement": [0, 1, 2, 3, 4], "back": [0, 4], "reproduc": [0, 1, 2], "exampl": [0, 1, 2, 4], "while": [0, 1, 2, 3, 4], "mani": [0, 3, 4], "resourc": [0, 2, 3], "cover": [0, 2, 3], "capabl": [0, 1, 2, 3, 4], "specif": [0, 1, 2, 3], "hidden": 0, "pitfal": 0, "engin": [0, 1, 2, 4], "technic": [0, 1, 2, 3, 4], "manag": [0, 1, 2, 3], "face": [0, 2, 4], "when": [0, 1, 2, 3, 4], "comprehens": [0, 1, 2, 3, 4], "guid": [0, 2, 4], "leverag": [0, 2, 3, 4], "battl": [0, 1], "test": [0, 1, 4], "tool": [0, 3], "throughout": [0, 3, 4], "tackl": [0, 2], "follow": [0, 2, 3, 4], "non": [0, 1, 4], "exhaust": 0, "list": [0, 2, 3, 4], "structur": [0, 2, 3], "un": 0, "reliabl": [0, 2, 4], "struggl": [0, 2, 4], "maintain": [0, 2, 3, 4], "consist": [0, 2, 3, 4], "output": [0, 2], "format": [0, 2, 3, 4], "complic": 0, "integr": [0, 2, 4], "larger": [0, 2, 3, 4], "make": [0, 2, 3, 4], "error": [0, 2, 4], "handl": [0, 1, 2, 3, 4], "more": [0, 2, 3, 4], "size": [0, 2, 4], "length": [0, 2, 4], "constraint": [0, 1, 3, 4], "strict": [0, 4], "token": [0, 1, 2, 4], "both": [0, 2], "input": [0, 2, 3, 4], "requir": [0, 3, 4], "care": [0, 2, 4], "chunk": [0, 1], "strategi": [0, 1, 2, 3], "long": [0, 1, 2, 4], "form": [0, 1, 2, 4], "effect": [0, 2, 3, 4], "tradit": 0, "softwar": [0, 4], "methodologi": [0, 2, 4], "break": [0, 2, 3], "down": [0, 2, 3], "deal": 0, "determinist": [0, 1, 4], "gener": [0, 1, 4], "new": [0, 2, 3, 4], "hallucin": [0, 2, 4], "These": [0, 2, 3, 4], "can": [0, 2, 3, 4], "plausibl": 0, "sound": 0, "entir": [0, 2, 3], "fabric": [0, 2], "inform": [0, 2, 3, 4], "creat": [0, 2, 3, 4], "signific": [0, 2, 3, 4], "risk": [0, 2, 3], "safeti": [0, 2, 4], "secur": [0, 2, 3, 4], "harm": [0, 2], "bias": [0, 2, 4], "inappropri": 0, "safeguard": [0, 2], "monitor": [0, 1], "ensur": [0, 2, 3, 4], "safe": [0, 4], "deploy": [0, 1, 2, 4], "cost": [0, 2, 4], "optim": [0, 1, 2, 3], "The": [0, 3, 4], "financi": [0, 2, 3, 4], "oper": [0, 2, 3], "base": [0, 1, 4], "quickli": [0, 3], "becom": [0, 2], "prohibit": 0, "without": [0, 2, 3, 4], "observ": [0, 2, 4], "vendor": [0, 1, 2], "lock": [0, 1], "cloud": [0, 2, 4], "provid": [0, 2, 3], "depend": [0, 2, 4], "through": [0, 1, 2, 3, 4], "proprietari": [0, 4], "infrastructur": 0, "difficult": [0, 2], "switch": 0, "self": [0, 1, 2], "host": [0, 1, 2], "take": [0, 1, 2, 3, 4], "hand": [0, 3, 4], "concret": [0, 1], "you": [0, 2, 3, 4], "run": [0, 2, 4], "modifi": 0, "real": [0, 2, 3, 4], "world": [0, 2, 4], "scenario": [0, 2, 4], "best": [0, 1, 2], "techniqu": [0, 1, 2, 3], "pattern": [0, 1, 2, 4], "anti": 0, "look": [0, 1, 2], "our": [0, 2, 3, 4], "goal": [0, 2, 3], "discourag": 0, "us": [0, 3, 4], "enabl": [0, 2, 3, 4], "By": [0, 1, 2, 3, 4], "understand": [0, 1, 2, 3, 4], "upfront": [0, 1], "better": [0, 1, 2, 3], "equip": [0, 1], "avoid": [0, 2, 4], "current": [0, 1, 2, 3, 4], "discours": [0, 1], "around": [0, 1, 2, 3, 4], "tend": [0, 1, 2], "toward": [0, 2, 4], "extrem": 0, "either": [0, 2, 3], "uncrit": 0, "enthusiasm": 0, "wholesal": 0, "dismiss": 0, "differ": [0, 2, 3, 4], "focu": [0, 1, 2, 3, 4], "rather": [0, 2], "than": [0, 2], "theoret": 0, "examin": [0, 3, 4], "first": [0, 2, 3, 4], "everi": 0, "concept": [0, 2], "illustr": [0, 2, 3], "execut": [0, 2], "immedi": 0, "analysi": [0, 1, 2, 3], "balanc": [0, 2, 3, 4], "help": [0, 2, 3, 4], "reader": [0, 1], "decis": [0, 2, 4], "intend": 0, "develop": [0, 2, 3, 4], "step": [0, 1, 2, 4], "insight": [0, 2, 3, 4], "along": [0, 2], "guidanc": [0, 4], "framework": [0, 2], "could": [0, 2, 3, 4], "derail": 0, "project": 0, "earli": [0, 4], "befor": [0, 2, 4], "thei": [0, 2, 3, 4], "costli": 0, "problem": [0, 1], "too": [0, 2, 3], "late": 0, "lifecycl": 0, "design": [0, 1, 3, 4], "lead": [0, 2, 3, 4], "genai": 0, "initi": [0, 2, 3], "leader": 0, "architectur": [0, 2, 3], "advoc": 0, "anyon": 0, "seek": [0, 2], "work": [0, 1, 2, 3, 4], "typic": [0, 2, 3], "job": 0, "role": [0, 2, 3, 4], "platform": [0, 2, 3, 4], "backend": [0, 2], "exist": [0, 2], "ml": 0, "transit": [0, 2, 3], "overse": 0, "motiv": [0, 2, 4], "need": [0, 2, 3], "readi": [0, 2], "desir": [0, 2, 4], "perform": [0, 1, 2, 3, 4], "after": [0, 2, 3], "read": [0, 2, 3, 4], "implic": [0, 1, 2], "experi": [0, 2, 3, 4], "recommend": [0, 2, 3, 4], "abl": [0, 3, 4], "deploi": [0, 3], "proper": [0, 4], "realist": 0, "effort": [0, 2, 4], "estim": [0, 2], "impact": [0, 2, 3, 4], "timelin": 0, "To": [0, 2, 3, 4], "most": [0, 2, 3, 4], "should": [0, 2, 3, 4], "basic": [0, 2, 3], "program": [0, 2], "knowledg": [0, 2], "introductori": [0, 1], "langchain": [0, 1, 3], "e": [0, 2, 3, 4], "g": [0, 2, 3, 4], "chat": [0, 2, 3, 4], "prompt": [0, 1, 2], "templat": [0, 1, 2], "access": [0, 2, 3, 4], "openai": [0, 2, 4], "anthrop": [0, 4], "similar": [0, 2, 4], "grade": 0, "dive": 0, "here": [0, 2, 3, 4], "get": [0, 2, 3, 4], "start": [0, 2, 4], "activ": [0, 2], "virtual": [0, 2], "m": [0, 2], "venv": 0, "env": [0, 2, 3, 4], "bin": 0, "On": [0, 4], "window": [0, 1], "script": 0, "instal": [0, 2, 4], "packag": 0, "pip": [0, 2, 4], "r": [0, 2, 3, 4], "txt": [0, 2, 3, 4], "file": [0, 2, 3, 4], "root": 0, "directori": [0, 2], "add": [0, 3], "other": [0, 2, 3, 4], "sensit": [0, 2], "openai_api_kei": 0, "your_openai_api_key_her": 0, "never": 0, "share": [0, 2, 4], "commit": [0, 2], "version": [0, 2, 4], "control": [0, 2, 4], "contain": [0, 2, 3, 4], "kept": [0, 2], "privat": 0, "clone": 0, "companion": 0, "git": 0, "http": [0, 2, 3, 4], "com": [0, 2, 3, 4], "souzatharsi": 0, "tamingllm": 0, "cd": 0, "If": [0, 2, 4], "encount": [0, 1, 2], "rate": [0, 2], "consid": [0, 2, 3, 4], "smaller": [0, 2, 3, 4], "retri": [0, 4], "logic": [0, 2, 3], "conflict": 0, "try": [0, 2, 4], "fresh": 0, "like": [0, 2, 3, 4], "poetri": 0, "check": [0, 2], "page": 0, "known": [0, 2, 4], "now": [0, 2, 3, 4], "let": [0, 2, 3, 4], "begin": [0, 2], "explor": [0, 2, 4], "dr": 0, "tharsi": 0, "souza": 0, "scientist": 0, "special": [0, 2, 4], "he": [0, 2], "lectur": 0, "columbia": 0, "univers": 0, "master": [0, 4], "scienc": [0, 2], "appli": [0, 2, 3], "analyt": 0, "head": [0, 3], "equiti": 0, "citadel": 0, "former": [0, 2], "senior": 0, "vp": 0, "two": [0, 2, 3, 4], "sigma": 0, "invest": [0, 2, 4], "With": [0, 2], "over": [0, 1, 2, 3, 4], "15": [0, 2, 4], "deliv": 0, "across": [0, 2, 4], "startup": 0, "fortun": 0, "500": [0, 2], "compani": [0, 2, 3, 4], "global": 0, "also": [0, 2, 3, 4], "an": [0, 1, 2, 3, 4], "numer": [0, 2], "scholarli": 0, "frequent": [0, 4], "speaker": 0, "academ": [0, 2], "busi": [0, 2], "confer": [0, 4], "ground": [0, 1, 2], "background": [0, 3], "draw": [0, 2, 4], "scale": [0, 2, 4], "stage": 0, "major": [0, 2, 4], "institut": 0, "well": [0, 2, 4], "advis": 0, "profit": [0, 2, 3, 4], "organ": [0, 2, 3], "contribut": [0, 2, 3], "uniqu": [0, 2], "bridg": 0, "gap": 0, "between": [0, 2, 3, 4], "potenti": [0, 2, 3, 4], "next": [0, 2, 4], "hold": 0, "ph": 0, "d": [0, 2, 4], "ucl": 0, "london": 0, "phil": 0, "sc": 0, "b": [0, 2, 4], "abstract": [1, 2, 4], "heavili": [1, 2, 4], "gloss": 1, "fundament": [1, 2, 4], "challeng": [1, 2, 3, 4], "convers": [1, 2, 3, 4], "thi": [1, 2, 3, 4], "book": 1, "kei": [1, 4], "python": [1, 2, 3, 4], "proven": 1, "yet": [1, 2, 3], "i": [1, 2, 3, 4], "unstructur": [1, 4], "context": [1, 2, 3, 4], "code": [1, 2, 4], "sidestep": 1, "inher": [1, 2, 3, 4], "core": [1, 2], "we": [1, 2, 3, 4], "ll": [1, 2], "address": [1, 2, 3, 4], "approach": [1, 2, 3, 4], "note": [1, 2, 3, 4], "perspect": 1, "who": [1, 2, 3, 4], "For": [1, 2, 3, 4], "outcom": [1, 2, 4], "prerequisit": 1, "set": [1, 2, 3, 4], "up": [1, 2, 3, 4], "your": [1, 2, 3, 4], "environ": [1, 2, 3, 4], "setup": [1, 2, 4], "api": [1, 2], "configur": [1, 2], "repositori": [1, 2], "troubleshoot": 1, "common": [1, 2, 3, 4], "issu": [1, 2, 3, 4], "about": [1, 2, 3, 4], "author": [1, 4], "": [1, 2, 3, 4], "statement": 1, "One": [1, 2], "shot": [1, 2], "json": [1, 2, 3], "mode": 1, "outlin": [1, 2], "multipl": [1, 2, 3, 4], "choic": [1, 2, 4], "pydant": [1, 2, 4], "discuss": [1, 2], "compar": [1, 2, 3], "research": [1, 2, 3], "ongo": [1, 2], "debat": 1, "conclus": [1, 2], "acknowledg": [1, 2], "refer": 1, "content": 1, "what": [1, 2, 4], "ar": [1, 2, 4], "contextu": [1, 2], "link": 1, "write": [1, 2, 4], "construct": [1, 2, 4], "dynam": [1, 2], "paramet": [1, 2, 4], "report": [1, 2, 4], "usag": [1, 2, 4], "futur": [1, 2], "consider": [1, 4], "machin": 1, "temperatur": [1, 3, 4], "sampl": [1, 3, 4], "spectrum": 1, "properti": 1, "conceptu": [1, 4], "overview": [1, 4], "compon": [1, 2], "metric": 1, "evalu": [1, 3, 4], "human": [1, 3, 4], "benchmark": 1, "leaderboard": 1, "type": [1, 2, 3, 4], "detect": [1, 2, 4], "retriev": [1, 2], "augment": [1, 2], "rag": 1, "select": [1, 2], "index": [1, 2, 3], "vector": 1, "store": [1, 2, 3], "method": [1, 2, 3, 4], "pipelin": [1, 2, 4], "valid": [1, 2, 4], "guard": 1, "filter": [1, 2], "sanit": 1, "alert": 1, "cach": [1, 2], "invalid": [1, 4], "predict": [1, 2, 4], "llama": [1, 2, 4], "llamafil": 1, "ollama": 1, "migrat": 1, "commun": [1, 2, 4], "doesn": [2, 3, 4], "t": [2, 3, 4], "matter": 2, "beauti": 2, "theori": 2, "smart": 2, "agre": 2, "wrong": 2, "richard": 2, "feynman": 2, "natur": [2, 3, 4], "unlik": 2, "where": [2, 3, 4], "same": [2, 3, 4], "produc": [2, 4], "novel": 2, "text": [2, 3, 4], "train": [2, 4], "data": [2, 3, 4], "respons": [2, 3, 4], "each": [2, 3], "time": [2, 3, 4], "re": [2, 3, 4], "queri": 2, "even": [2, 3, 4], "ident": 2, "behavior": 2, "strength": 2, "ask": [2, 4], "question": [2, 4], "isn": 2, "bug": 2, "featur": [2, 4], "random": 2, "allow": [2, 3, 4], "creativ": [2, 4], "divers": [2, 3, 4], "testabl": 2, "servic": [2, 3, 4], "advic": 2, "mean": [2, 3, 4], "yield": 2, "exceedingli": 2, "regulatori": 2, "complianc": [2, 4], "guarante": [2, 4], "user": [2, 3], "trust": [2, 4], "affect": 2, "inconsist": [2, 4], "primari": 2, "determin": [2, 3, 4], "come": [2, 3, 4], "dure": [2, 4], "calcul": 2, "probabl": [2, 4], "distribut": [2, 4], "nucleu": 2, "holtzman": 2, "et": [2, 4], "al": [2, 4], "2020": 2, "top": [2, 4], "k": [2, 3, 4], "coher": [2, 3], "0": [2, 3, 4], "repetit": [2, 3, 4], "1": [2, 4], "increas": [2, 3, 4], "incoher": 2, "dotenv": [2, 3, 4], "import": [2, 3, 4], "load_dotenv": [2, 3, 4], "o": [2, 3, 4], "load": [2, 3, 4], "variabl": [2, 3, 4], "panda": 2, "pd": 2, "def": [2, 3, 4], "generate_respons": 2, "model_nam": [2, 3], "str": [2, 3, 4], "float": [2, 3], "attempt": [2, 3], "int": [2, 3], "3": [2, 4], "datafram": 2, "demonstr": [2, 3, 4], "client": [2, 4], "result": [2, 3, 4], "temp": 2, "rang": [2, 3, 4], "complet": [2, 3, 4], "messag": [2, 4], "max_token": 2, "50": 2, "append": [2, 3, 4], "displai": [2, 4], "group": [2, 3], "df_result": 2, "print": [2, 3, 4], "f": [2, 3, 4], "ntemperatur": 2, "40": 2, "temp_respons": 2, "_": 2, "row": 2, "iterrow": 2, "return": [2, 3, 4], "max_length": [2, 4], "10000": [2, 3, 4], "appl": [2, 3, 4], "sec_fil": [2, 4], "gpt": [2, 3, 4], "5": [2, 3, 4], "turbo": [2, 3, 4], "singl": [2, 3, 4], "summari": [2, 4], "2": [2, 4], "inc": [2, 3, 4], "its": [2, 3, 4], "10": [2, 3, 4], "fiscal": [2, 3], "end": [2, 3], "septemb": [2, 3], "28": [2, 3], "2024": [2, 3, 4], "sec": [2, 3, 4], "detail": [2, 3, 4], "season": 2, "issuer": 2, "california": [2, 4], "manufactur": 2, "market": [2, 3, 4], "smartphon": 2, "person": [2, 4], "tablet": 2, "wearabl": [2, 4], "accessori": 2, "innov": [2, 3], "condit": 2, "exchang": [2, 3, 4], "commiss": [2, 3, 4], "factor": [2, 3, 4], "invdestacksmeticsisdict": 2, "setispect": 2, "20cyan": 2, "evaluationseld": 2, "anvis": 2, "droitent": 2, "discernminerv": 2, "versbobprefvers": 2, "vo\u8be5": 2, "option\u548c": 2, "meio": 2, "forecast": 2, "\u0432\u0440\u0435\u043ccisco": 2, "dellaischenpoihscap": 2, "geme": 2, "gettim": 2, "simpl": [2, 3, 4], "dramat": [2, 4], "alter": 2, "wai": [2, 3, 4], "systemat": [2, 4], "assess": [2, 3], "At": 2, "rigid": 2, "vari": 2, "less": 2, "wildli": 2, "often": [2, 3, 4], "inadequ": 2, "one": [2, 3, 4], "radic": 2, "reli": 2, "u": [2, 4], "grappl": 2, "probabilist": 2, "lower": [2, 4], "seem": [2, 4], "safer": 2, "don": [2, 3, 4], "necessarili": 2, "elimin": 2, "underli": [2, 4], "uncertainti": 2, "highlight": [2, 3, 4], "paradigm": 2, "aspect": [2, 3, 4], "beyond": 2, "present": [2, 3, 4], "anoth": 2, "fascin": 2, "abil": [2, 4], "spontan": 2, "aris": 2, "answer": [2, 3, 4], "reason": [2, 3, 4], "aren": 2, "explicitli": 2, "grow": [2, 4], "against": 2, "clear": [2, 4], "wei": 2, "2022": 2, "fig": [2, 3, 4], "4": 2, "relationship": 2, "linear": 2, "below": [2, 3, 4], "certain": [2, 3, 4], "threshold": 2, "absent": 2, "simpli": [2, 3, 4], "much": 2, "coax": 2, "out": [2, 3], "onc": [2, 3], "reach": [2, 3, 4], "point": [2, 3], "journei": 2, "suddenli": 2, "manifest": 2, "call": [2, 3, 4], "phase": 2, "shift": 2, "inabl": 2, "unpredict": [2, 4], "stand": 2, "stark": 2, "contrast": 2, "deliber": 2, "press": 2, "convent": 2, "stabl": 2, "suit": 2, "defin": [2, 3, 4], "accept": 2, "criteria": 2, "contend": 2, "constantli": 2, "7b": 2, "70b": 2, "ha": [2, 4], "rethink": 2, "practic": [2, 3], "math": 2, "tutor": 2, "children": 2, "would": [2, 3, 4], "verifi": [2, 4], "function": [2, 3, 4], "But": [2, 4], "just": [2, 3, 4], "predefin": [2, 4], "adapt": [2, 3], "explan": [2, 4], "child": 2, "level": [2, 3, 4], "engag": [2, 4], "ag": 2, "appropri": [2, 3, 4], "bound": 2, "rais": [2, 3], "measur": 2, "weren": 2, "evolv": [2, 3], "accuraci": [2, 4], "subject": 2, "qualiti": [2, 3, 4], "kind": 2, "There": [2, 3, 4], "account": 2, "tabl": [2, 3, 4], "sever": [2, 3, 4], "dimens": 2, "pre": 2, "extend": [2, 4], "explicit": [2, 4], "usual": 2, "precis": [2, 4], "involv": [2, 4], "resist": 2, "straightforward": [2, 3, 4], "quantif": 2, "score": [2, 4], "judgment": 2, "remain": [2, 3], "contamin": 2, "carefulli": [2, 4], "craft": [2, 4], "case": [2, 3, 4], "expect": [2, 3, 4], "unit": [2, 3, 4], "massiv": 2, "internet": 2, "alreadi": 2, "seen": 2, "memor": 2, "artifici": 2, "inflat": 2, "curat": 2, "truli": 2, "unseen": 2, "rigor": 2, "cross": 2, "evolut": 2, "continu": [2, 3, 4], "advanc": [2, 3, 4], "longitudin": 2, "comparison": 2, "obsolet": 2, "older": 2, "autom": [2, 4], "demand": [2, 4], "oversight": 2, "annot": 2, "review": [2, 4], "process": [2, 3, 4], "mostli": [2, 4], "distinct": 2, "versu": 2, "latter": 2, "foundat": [2, 3], "purpos": [2, 4], "tailor": 2, "particular": [2, 4], "combin": [2, 3, 4], "associ": [2, 3, 4], "solv": [2, 4], "That": [2, 4], "differenti": 2, "becaus": 2, "chang": 2, "scope": [2, 3], "includ": [2, 3, 4], "thing": [2, 4], "instead": [2, 3, 4], "meet": [2, 4], "align": [2, 3, 4], "object": [2, 4], "A": [2, 3, 4], "great": [2, 4], "categori": 2, "why": [2, 4], "misinform": 2, "prevent": [2, 4], "factual": 2, "databas": [2, 4], "citat": 2, "tempor": 2, "scientif": 2, "fals": [2, 4], "reduc": [2, 3, 4], "legal": 2, "reput": 2, "support": [2, 4], "protect": 2, "manipul": 2, "unqualifi": 2, "recognit": 2, "medic": 2, "disclaim": 2, "profession": [2, 4], "referr": 2, "mechan": 2, "boundari": 2, "situat": [2, 3], "incorrect": 2, "liabil": 2, "vulner": 2, "standard": 2, "expertis": 2, "util": [2, 3], "bia": [2, 4], "gender": 2, "racial": 2, "cultur": 2, "demograph": 2, "represent": [2, 3], "inclus": [2, 3, 4], "stereotyp": 2, "fair": 2, "reinforc": 2, "societ": 2, "equal": 2, "social": 2, "brand": 2, "privaci": [2, 4], "pii": 2, "anonym": 2, "leakag": 2, "carryov": 2, "regul": [2, 4], "protocol": 2, "confidenti": 2, "breach": 2, "cognit": 2, "multi": [2, 4], "mathemat": 2, "fallaci": 2, "causal": 2, "edg": 2, "think": 2, "mainten": 2, "idiom": 2, "sarcasm": 2, "terminologi": 2, "lingual": 2, "misunderstand": 2, "sophist": [2, 3], "syntax": 2, "scan": 2, "document": [2, 3, 4], "compat": [2, 4], "stabil": 2, "effici": [2, 3, 4], "debt": 2, "scalabl": [2, 3], "failur": 2, "meta": [2, 3], "correct": [2, 4], "feedback": [2, 4], "overconfid": 2, "improv": [2, 3, 4], "clariti": [2, 3, 4], "audienc": 2, "densiti": 2, "transfer": 2, "satisfact": [2, 4], "ethic": 2, "request": [2, 3, 4], "incid": 2, "misus": 2, "moral": 2, "valu": [2, 3, 4], "transpar": [2, 4], "stakehold": 2, "environment": 2, "co2": 2, "emiss": 2, "energi": 2, "consumpt": 2, "per": [2, 3], "server": [2, 4], "locat": 2, "batch": 2, "hardwar": 2, "infer": 2, "sustain": 2, "corpor": 2, "three": 2, "app": 2, "imag": 2, "audio": 2, "etc": [2, 4], "truth": [2, 4], "option": [2, 3, 4], "layer": [2, 3, 4], "repres": [2, 4], "palm": 2, "individu": [2, 3], "target": [2, 4], "further": [2, 3, 4], "see": [2, 4], "avail": [2, 3, 4], "addition": 2, "shown": 2, "fix": [2, 3], "all": [2, 3, 4], "default": [2, 4], "quantifi": 2, "rank": 2, "easi": [2, 3], "addit": [2, 3, 4], "quantit": 2, "among": 2, "aggreg": 2, "plan": [2, 4], "pertain": 2, "previous": [2, 3, 4], "introduc": [2, 3, 4], "doe": [2, 3, 4], "good": [2, 4], "ani": [2, 3, 4], "separ": [2, 3], "synthet": [2, 4], "updat": [2, 3, 4], "reflect": 2, "post": [2, 4], "launch": 2, "timeout": 2, "variat": 2, "maxim": 2, "success": [2, 4], "inter": 2, "rater": 2, "weight": 2, "rel": 2, "priorit": 2, "normal": [2, 4], "absolut": [2, 4], "fail": 2, "confid": [2, 4], "interv": 2, "ti": 2, "veri": 2, "close": 2, "tier": 2, "holist": 2, "built": [2, 4], "mind": 2, "x": 2, "fast": 2, "particularli": [2, 3, 4], "promot": 2, "rapid": 2, "experiment": [2, 4], "iter": [2, 3, 4], "final": [2, 3, 4], "keep": [2, 3], "itself": 2, "confirm": 2, "vi": 2, "later": [2, 4], "chapter": 2, "categor": [2, 4], "broad": [2, 4], "intrins": 2, "extrins": 2, "sequenc": [2, 4], "perplex": 2, "given": [2, 3, 4], "variou": [2, 3, 4], "downstream": [2, 4], "directli": [2, 4], "valuabl": [2, 4], "interest": [2, 3, 4], "sinc": [2, 3, 4], "term": [2, 3], "discrimin": 2, "distinguish": 2, "classifi": 2, "sentiment": [2, 4], "classif": [2, 4], "identifi": [2, 3, 4], "whether": [2, 3, 4], "true": [2, 3, 4], "synthesi": 2, "summar": [2, 3], "log": 2, "discret": 2, "recal": 2, "f1": 2, "match": [2, 4], "exact": 2, "prefix": 2, "translat": 2, "roug": 2, "bleu": 2, "charact": [2, 3, 4], "n": [2, 3], "gram": 2, "characterist": 2, "being": [2, 4], "short": [2, 3, 4], "wide": [2, 3, 4], "definit": [2, 4], "bilingu": 2, "understudi": 2, "overlap": [2, 3], "favor": [2, 4], "due": [2, 3], "breviti": 2, "penalti": 2, "insensit": 2, "semant": [2, 3], "high": [2, 3], "orient": 2, "gist": 2, "word": [2, 3, 4], "sentenc": [2, 3, 4], "ignor": 2, "equival": 2, "influenc": [2, 4], "meteor": 2, "synonym": 2, "stem": [2, 4], "paraphras": 2, "alongsid": 2, "computation": [2, 3], "expens": 2, "cider": 2, "consensu": 2, "descript": [2, 4], "tf": 2, "idf": 2, "caption": 2, "outsid": 2, "reliant": 2, "corpu": 2, "statist": 2, "ter": 2, "edit": 2, "number": [2, 3, 4], "convert": [2, 4], "hypothesi": 2, "penal": 2, "bertscor": 2, "embed": [2, 3], "bert": 2, "spice": 2, "proposit": 2, "scene": 2, "graph": 2, "emphasi": 2, "onli": [2, 3, 4], "pure": 2, "textual": 2, "As": [2, 3, 4], "analyst": [2, 3], "prepar": [2, 3], "dictionari": [2, 4], "rouge_1": 2, "rouge_2": 2, "ideal": [2, 4], "expert": [2, 3, 4], "cheaper": 2, "4o": [2, 3, 4], "mini": [2, 3, 4], "evaluate_summari": 2, "unigram": 2, "bigram": 2, "huggingfac": 2, "librari": [2, 3, 4], "absl": 2, "py": 2, "rouge_scor": 2, "generated_summari": 2, "reference_summari": 2, "arg": [2, 3, 4], "dict": [2, 3, 4], "google_bleu": 2, "bleu_scor": 2, "rouge1": 2, "rouge2": 2, "instanc": [2, 3], "arbitrari": 2, "chosen": 2, "sentence1": 2, "cat": 2, "sat": 2, "mat": 2, "sentence2": 2, "ate": 2, "3333333333333333": 2, "7272727272727272": 2, "4444444444444445": 2, "generate_summari": 2, "summir": 2, "correspond": [2, 4], "liner": 2, "excerpt": 2, "evaluate_summary_model": 2, "model_benchmark": 2, "models_test": 2, "benchmark_summari": 2, "model_summari": 2, "evaluation_result": 2, "line": 2, "name": [2, 3, 4], "zip": 2, "annual": 2, "stock": [2, 4], "govern": 2, "forward": 2, "reveal": 2, "analyz": [2, 3, 4], "statu": 2, "concis": 2, "omit": [2, 4], "essenti": [2, 3, 4], "element": [2, 4], "Its": 2, "adequ": 2, "verbos": 2, "relev": 2, "peripher": 2, "quit": [2, 4], "overli": [2, 4], "simplifi": [2, 4], "miss": 2, "convei": [2, 3], "breadth": 2, "Of": 2, "cours": 2, "abov": 2, "vibe": 2, "visualize_prompt_comparison": 2, "visual": 2, "matplotlib": 2, "radar": 2, "plot": 2, "radar_plot": 2, "show": [2, 3, 4], "tmp": 2, "ipykernel_1652501": 2, "940173201": 2, "userwarn": 2, "figurecanvasagg": 2, "thu": 2, "put": 2, "closest": 2, "largest": 2, "deviat": [2, 4], "suggest": [2, 4], "least": 2, "establish": 2, "otherwis": 2, "qualit": 2, "driven": 2, "mention": [2, 4], "might": [2, 3, 4], "fulli": [2, 3], "nuanc": [2, 3, 4], "especi": [2, 3, 4], "those": [2, 3, 4], "primarili": 2, "granular": [2, 3], "altern": [2, 3], "section": [2, 3, 4], "fall": 2, "judg": 2, "themselv": 2, "act": 2, "paper": [2, 4], "main": [2, 3, 4], "instruct": [2, 3, 4], "fine": [2, 4], "tune": [2, 4], "assign": 2, "likert": 2, "style": 2, "pairwis": 2, "ensembl": 2, "repeatedli": 2, "domain": 2, "procedur": 2, "fluenci": 2, "interpret": 2, "refin": 2, "excel": [2, 4], "narr": 2, "flow": [2, 3], "mirror": 2, "guidelin": 2, "express": [2, 4], "similarli": 2, "notabl": [2, 4], "properli": [2, 4], "henc": 2, "worth": 2, "integ": 2, "rubric": 2, "foster": [2, 4], "hollist": 2, "judgeevalu": 2, "enforc": [2, 4], "four": 2, "grammar": [2, 4], "evaluate_with_llm": 2, "candid": 2, "pars": [2, 4], "criterion": 2, "basemodel": [2, 4], "class": [2, 3, 4], "judge_model": 2, "candidate_summari": 2, "specifi": [2, 3, 4], "wa": [2, 4], "written": 2, "grammat": 2, "y": 2, "z": 2, "w": [2, 3], "beta": [2, 4], "response_format": [2, 4], "Then": 2, "benchmark_model": 2, "test_model": 2, "input_text": [2, 3], "tupl": 2, "iphon": [2, 4], "mac": [2, 4], "ipad": [2, 4], "incorpor": 2, "regard": 2, "obtain": [2, 4], "respect": 2, "regist": 2, "approxim": [2, 4], "6": [2, 3, 4], "trillion": [2, 4], "held": [2, 4], "affili": [2, 4], "billion": 2, "outstand": [2, 4], "octob": [2, 4], "18": [2, 4], "7": [2, 3], "8": [2, 3], "evals_list": 2, "1775618912": 2, "14": [2, 4], "some": [2, 3, 4], "achiev": [2, 4], "variant": 2, "slightli": 2, "indic": [2, 4], "drift": 2, "had": 2, "lowest": 2, "overal": [2, 3, 4], "drop": 2, "substanti": 2, "gradient": 2, "visibl": 2, "mark": 2, "degrad": [2, 4], "firstli": 2, "overhead": 2, "neglect": 2, "exhibit": 2, "prefer": [2, 4], "posit": [2, 3, 4], "egocentr": 2, "tight": 2, "small": [2, 4], "field": [2, 4], "financ": 2, "law": 2, "suitabl": 2, "serv": [2, 3, 4], "aproach": 2, "significantli": [2, 3], "workflow": [2, 4], "assessor": 2, "intens": [2, 4], "aplic": 2, "aim": [2, 3, 4], "clearli": [2, 4], "earlier": 2, "depict": [2, 4], "higher": 2, "correl": 2, "were": [2, 4], "multilingu": 2, "golden": 2, "recruit": 2, "languang": 2, "arena": 2, "vote": 2, "made": [2, 3, 4], "under": [2, 4], "blind": 2, "randomli": 2, "pair": 2, "submit": 2, "actual": [2, 3, 4], "loop": 2, "customiz": 2, "irrelev": 2, "unhelp": 2, "sometim": [2, 4], "though": [2, 4], "occasion": 2, "regularli": 2, "inquiri": 2, "rare": 2, "inaccuraci": 2, "highli": [2, 4], "perfectli": 2, "cater": 2, "polici": 2, "benefit": [2, 4], "critiqu": 2, "elo": 2, "democrat": [2, 4], "simul": [2, 4], "thought": [2, 4], "exam": 2, "probe": 2, "certifi": 2, "identif": 2, "histori": 2, "prioriti": 2, "intellig": 2, "move": [2, 3], "began": 2, "2018": 2, "introduct": [2, 3], "glue": 2, "wang": 2, "2019": 2, "entail": 2, "baselin": 2, "superglu": 2, "expand": 2, "deeper": [2, 3], "successor": 2, "grew": 2, "broader": 2, "big": 2, "bench": 2, "srivastava": 2, "2023": 2, "turn": 2, "200": 2, "span": 2, "arithmet": 2, "collabor": 2, "truthfulqa": 2, "lin": [2, 4], "accur": [2, 4], "decept": 2, "increasingli": [2, 4], "multitask": 2, "hendryck": 2, "2021": 2, "multidisciplinari": 2, "57": 2, "stanford": 2, "helm": 2, "liang": 2, "multidimension": 2, "concern": 2, "surround": [2, 4], "emphas": [2, 4], "humanev": 2, "chen": [2, 4], "lmsy": 2, "brought": 2, "dialogu": 2, "len": [2, 3], "replic": [2, 4], "find": [2, 3, 4], "industri": [2, 4], "chatbot": 2, "chiang": 2, "direct": 2, "live": 2, "gather": 2, "000": [2, 4], "assist": [2, 4], "alpacaev": 2, "duboi": 2, "mt": 2, "zheng": 2, "Their": [2, 4], "newer": 2, "render": 2, "ineffect": 2, "crowdsourc": 2, "own": [2, 3], "livebench": 2, "white": 2, "resili": 2, "competit": 2, "free": 2, "70": 2, "meaningfulli": 2, "monthli": 2, "came": 2, "center": [2, 4], "arc": 2, "prize": 2, "chollet": 2, "mike": 2, "knoop": 2, "co": 2, "founder": 2, "zapier": 2, "fran\u00e7oi": 2, "creator": 2, "agi": 2, "kera": 2, "narrow": 2, "suffici": [2, 4], "meaning": [2, 3, 4], "capac": 2, "genuin": 2, "accord": 2, "econom": 2, "acquir": 2, "skill": 2, "five": 2, "old": 2, "possess": 2, "count": [2, 3], "elementari": 2, "physic": 2, "novelti": 2, "puzzl": 2, "someth": 2, "wouldn": 2, "vast": 2, "interpol": 2, "memori": [2, 3], "synthes": 2, "fly": 2, "brute": 2, "possibl": [2, 4], "million": 2, "seri": 2, "minim": [2, 4], "prior": 2, "submiss": 2, "pixel": 2, "perfect": 2, "color": 2, "modern": [2, 3, 4], "fourrier": 2, "lightweight": [2, 4], "varieti": 2, "bespok": 2, "via": [2, 4], "sdk": 2, "cli": 2, "been": 2, "extract": [2, 3, 4], "autoregress": 2, "conduct": 2, "sub": 2, "liter": 2, "disturb": 2, "zero": 2, "varianc": 2, "yt": 2, "ut": 2, "uncondit": 2, "33": 2, "suppos": 2, "p": 2, "08": [2, 4], "exactli": [2, 4], "ii": [2, 4], "iv": 2, "iii": 2, "c": [2, 4], "consequ": 2, "ol": 2, "heteroscedast": 2, "regress": 2, "ineffici": 2, "wish": 2, "lag": 2, "var": 2, "bivari": 2, "acceler": 2, "evaluation_track": 2, "evaluationtrack": 2, "model_config": 2, "basemodelconfig": 2, "parallelismmanag": 2, "pipelineparamet": 2, "envconfig": 2, "is_accelerate_avail": 2, "datetim": 2, "timedelta": 2, "initprocessgroupkwarg": 2, "create_evaluation_pipelin": 2, "output_dir": 2, "cache_dir": 2, "pretrain": 2, "dtype": 2, "float16": 2, "max_sampl": 2, "kwargs_handl": 2, "second": [2, 3], "3000": 2, "els": [2, 3], "none": 2, "save_detail": 2, "push_to_hub": 2, "pipeline_param": 2, "launcher_typ": 2, "env_config": 2, "override_batch_s": 2, "use_chat_templ": 2, "trust_remote_cod": 2, "pipeline_paramet": 2, "schemat": [2, 3], "vllm": [2, 4], "tgi": 2, "instanti": 2, "storag": 2, "local": [2, 3, 4], "track": 2, "push": 2, "hub": 2, "parallel": 2, "temporari": 2, "maximum": [2, 3], "num_few_shot": 2, "automat": 2, "string": [2, 4], "vertic": 2, "bar": 2, "few": [2, 3, 4], "binari": 2, "flag": 2, "bigbench": 2, "winogrand": 2, "hellaswag": 2, "nlp": 2, "choos": 2, "1b": 2, "save": [2, 3], "save_and_push_result": 2, "show_result": 2, "model_arg": 2, "download": 2, "remot": 2, "send": [2, 4], "serverless": 2, "dedic": [2, 4], "id": 2, "inference_server_address": 2, "inference_server_auth": 2, "model_id": 2, "null": 2, "bash": 2, "command": 2, "model_config_path": 2, "path": [2, 3], "endpoint_model": 2, "yaml": [2, 4], "llama3": [2, 3], "qwen2": [2, 4], "smollm2": 2, "describ": 2, "3b": 2, "alibaba": [2, 4], "5b": [2, 4], "hui": 2, "yang": 2, "compact": 2, "360m": 2, "allal": 2, "9": 2, "trend": [2, 4], "cluster": 2, "degre": 2, "noteworthi": 2, "superior": 2, "taken": [2, 4], "grain": [2, 4], "salt": [2, 4], "100": [2, 4], "give": 2, "trade": [2, 4], "off": [2, 3, 4], "flexibl": [2, 3, 4], "exponenti": 2, "growth": 2, "hug": [2, 4], "ecosystem": 2, "modular": 2, "visit": 2, "offici": 2, "alb": 2, "24": [2, 4], "loubna": 2, "ben": 2, "anton": 2, "lozhkov": 2, "eli": 2, "bakouch": 2, "gabriel": 2, "mart\u00edn": 2, "bl\u00e1zquez": 2, "lewi": 2, "tunstal": 2, "agust\u00edn": 2, "piquer": 2, "andr": 2, "marafioti": 2, "cyril": 2, "zakka": 2, "leandro": 2, "von": 2, "werra": 2, "thoma": 2, "wolf": 2, "are24": 2, "judgearena": 2, "ctj": 2, "21": 2, "jerri": 2, "tworek": 2, "heewoo": 2, "jun": 2, "qime": 2, "yuan": 2, "henriqu": 2, "pond": 2, "de": 2, "oliveira": 2, "pinto": 2, "jare": 2, "kaplan": 2, "harri": 2, "edward": 2, "yuri": 2, "burda": 2, "nichola": 2, "joseph": 2, "greg": 2, "brockman": 2, "alex": 2, "rai": 2, "raul": 2, "puri": 2, "gretchen": 2, "krueger": 2, "michael": [2, 4], "petrov": 2, "heidi": 2, "khlaaf": 2, "girish": 2, "sastri": 2, "pamela": 2, "mishkin": 2, "brook": 2, "chan": 2, "scott": 2, "grai": 2, "nick": 2, "ryder": 2, "mikhail": 2, "pavlov": 2, "alethea": 2, "lukasz": 2, "kaiser": 2, "mohammad": 2, "bavarian": 2, "clemen": 2, "winter": 2, "philipp": 2, "tillet": 2, "felip": 2, "petroski": 2, "Such": 2, "dave": 2, "cum": 2, "matthia": 2, "plappert": 2, "fotio": 2, "chantzi": 2, "elizabeth": 2, "barn": 2, "ariel": 2, "herbert": 2, "voss": 2, "william": 2, "hebgen": 2, "guss": 2, "nichol": 2, "paino": 2, "nikola": 2, "tezak": 2, "jie": 2, "tang": 2, "igor": 2, "babuschkin": 2, "suchir": 2, "balaji": 2, "shantanu": 2, "jain": 2, "saunder": 2, "christoph": 2, "hess": 2, "andrew": 2, "carr": 2, "jan": 2, "leik": 2, "josh": 2, "achiam": 2, "vedant": 2, "misra": 2, "evan": 2, "morikawa": 2, "alec": 2, "radford": 2, "matthew": 2, "knight": 2, "mile": 2, "brundag": 2, "mira": 2, "murati": 2, "kati": 2, "mayer": 2, "peter": 2, "welind": 2, "bob": [2, 4], "mcgrew": 2, "dario": 2, "amodei": 2, "sam": 2, "mccandlish": 2, "ilya": 2, "sutskev": 2, "wojciech": 2, "zaremba": 2, "url": [2, 4], "arxiv": [2, 4], "org": [2, 4], "ab": [2, 4], "2107": 2, "03374": 2, "cz": 2, "lianmin": 2, "ying": 2, "sheng": 2, "anastasio": 2, "angelopoulo": 2, "tianl": 2, "dacheng": 2, "hao": 2, "zhang": 2, "banghua": 2, "zhu": 2, "jordan": 2, "gonzalez": 2, "ion": 2, "stoica": 2, "2403": 2, "04132": 2, "cho24": 2, "francoi": 2, "websit": 2, "arcpriz": 2, "dglh24": 2, "yann": 2, "bal\u00e1z": 2, "galambosi": 2, "perci": 2, "tatsunori": 2, "hashimoto": 2, "debia": 2, "2404": 2, "04475": 2, "fac24a": 2, "wiki": [2, 4], "fac24b": 2, "fac24c": 2, "doc": [2, 3, 4], "model_doc": 2, "gpt2": 2, "fac24d": 2, "cookbook": 2, "en": [2, 4], "llm_judg": 2, "fac24": 2, "fac24f": 2, "space": 2, "blog": 2, "fhwt23": 2, "cl\u00e9mentin": 2, "nathan": 2, "habib": 2, "hbb": 2, "dan": 2, "collin": 2, "burn": 2, "steven": 2, "basart": 2, "andi": 2, "zou": 2, "manta": 2, "mazeika": 2, "dawn": 2, "song": 2, "jacob": 2, "steinhardt": 2, "2009": 2, "03300": 2, "hbd": 2, "20": [2, 4], "ari": 2, "bui": 2, "du": 2, "maxwel": 2, "forb": 2, "yejin": 2, "choi": 2, "curiou": 2, "neural": [2, 4], "degener": 2, "1904": 2, "09751": 2, "hyc": 2, "binyuan": 2, "jian": 2, "zeyu": 2, "cui": 2, "jiaxi": 2, "dayiheng": 2, "liu": [2, 4], "lei": 2, "tianyu": 2, "jiajun": 2, "bowen": 2, "yu": 2, "kai": 2, "dang": 2, "coder": 2, "preprint": [2, 4], "2409": 2, "12186": 2, "lx": 2, "zhen": 2, "xiaohan": 2, "xu": 2, "tao": 2, "shen": 2, "jia": 2, "gu": 2, "yuxuan": 2, "lai": 2, "chongyang": 2, "shuai": 2, "ma": 2, "nlg": 2, "2401": 2, "07103": 2, "lbl": 2, "23": 2, "rishi": 2, "bommasani": 2, "toni": 2, "lee": [2, 4], "dimitri": 2, "tsipra": 2, "dilara": 2, "soylu": 2, "michihiro": 2, "yasunaga": 2, "yian": 2, "deepak": 2, "narayanan": 2, "yuhuai": 2, "wu": [2, 4], "ananya": 2, "kumar": 2, "benjamin": 2, "newman": 2, "binhang": 2, "bobbi": 2, "yan": 2, "ce": 2, "christian": 2, "cosgrov": 2, "man": 2, "r\u00e9": 2, "diana": 2, "acosta": 2, "nava": 2, "drew": 2, "hudson": 2, "eric": 2, "zelikman": 2, "esin": 2, "durmu": 2, "faisal": 2, "ladhak": 2, "frieda": 2, "rong": 2, "hongyu": 2, "ren": 2, "huaxiu": 2, "yao": 2, "jue": 2, "keshav": 2, "santhanam": 2, "laurel": 2, "orr": 2, "lucia": 2, "mert": 2, "yuksekgonul": 2, "mirac": 2, "suzgun": 2, "kim": 2, "neel": 2, "guha": 2, "niladri": 2, "chatterji": 2, "omar": 2, "khattab": 2, "henderson": 2, "qian": 2, "huang": 2, "ryan": 2, "chi": [2, 4], "sang": 2, "xie": 2, "shibani": 2, "santurkar": 2, "surya": 2, "ganguli": 2, "icard": 2, "tianyi": 2, "vishrav": 2, "chaudhari": 2, "xuechen": 2, "yifan": 2, "yuhui": 2, "yuta": 2, "koreeda": 2, "2211": 2, "09110": 2, "lhe22": 2, "stephani": 2, "hilton": 2, "owain": 2, "mimic": 2, "falsehood": 2, "2109": 2, "07958": 2, "srr": 2, "aarohi": 2, "abhinav": 2, "rastogi": 2, "abhishek": 2, "rao": 2, "abu": 2, "awal": 2, "md": [2, 4], "shoeb": 2, "abubakar": 2, "abid": 2, "adam": 2, "fisch": 2, "brown": 2, "santoro": 2, "aditya": 2, "gupta": 2, "adri\u00e0": 2, "garriga": 2, "alonso": 2, "agnieszka": 2, "kluska": 2, "aitor": 2, "lewkowycz": 2, "akshat": 2, "agarw": 2, "warstadt": 2, "alexand": [2, 4], "kocurek": 2, "ali": 2, "safaya": 2, "tazarv": 2, "alic": [2, 4], "xiang": 2, "alicia": 2, "parrish": 2, "allen": 2, "nie": 2, "aman": 2, "hussain": 2, "amanda": 2, "askel": 2, "dsouza": 2, "ambros": 2, "slone": 2, "ameet": 2, "rahan": 2, "anantharaman": 2, "iyer": 2, "ander": 2, "andreassen": 2, "andrea": 2, "madotto": 2, "santilli": 2, "stuhlm\u00fcller": 2, "dai": [2, 4], "la": 2, "lampinen": 2, "angela": 2, "jiang": 2, "angelica": 2, "anh": 2, "vuong": 2, "animesh": 2, "anna": 2, "gottardi": 2, "antonio": 2, "norelli": 2, "anu": 2, "venkatesh": 2, "arash": 2, "gholamidavoodi": 2, "arfa": 2, "tabassum": 2, "arul": 2, "menez": 2, "arun": 2, "kirubarajan": 2, "asher": 2, "mullokandov": 2, "ashish": 2, "sabharw": 2, "austin": 2, "herrick": 2, "avia": 2, "efrat": 2, "aykut": 2, "erdem": 2, "ayla": 2, "karaka\u015f": 2, "robert": 2, "bao": 2, "loe": 2, "barret": 2, "zoph": 2, "bart\u0142omiej": 2, "bojanowski": 2, "batuhan": 2, "\u00f6zyurt": 2, "behnam": 2, "hedayatnia": 2, "neyshabur": 2, "inden": 2, "benno": 2, "stein": 2, "berk": 2, "ekmekci": 2, "bill": 2, "yuchen": 2, "blake": 2, "howald": 2, "bryan": 2, "orinion": 2, "cameron": [2, 4], "diao": 2, "dour": 2, "catherin": 2, "stinson": 2, "cedrick": 2, "argueta": 2, "c\u00e9sar": 2, "ferri": 2, "ram\u00edrez": 2, "chandan": 2, "singh": 2, "charl": 2, "rathkopf": 2, "chenlin": 2, "meng": 2, "chitta": 2, "baral": 2, "chiyu": 2, "chri": 2, "callison": 2, "burch": 2, "wait": 2, "voigt": 2, "pott": 2, "cindi": 2, "ramirez": 2, "clara": 2, "rivera": 2, "clemencia": 2, "siro": 2, "colin": 2, "raffel": 2, "courtnei": 2, "ashcraft": 2, "cristina": 2, "garbacea": 2, "damien": 2, "sileo": 2, "garrett": 2, "kilman": 2, "roth": 2, "daniel": 2, "freeman": 2, "khashabi": 2, "levi": 2, "mosegu\u00ed": 2, "gonz\u00e1lez": 2, "perszyk": 2, "danni": 2, "hernandez": 2, "danqi": 2, "daphn": 2, "ippolito": 2, "dar": 2, "gilboa": 2, "david": 2, "dohan": 2, "drakard": 2, "jurgen": 2, "debajyoti": 2, "datta": 2, "deep": 2, "deni": 2, "emelin": 2, "kleyko": 2, "deniz": 2, "yuret": 2, "derek": 2, "tam": [2, 4], "dieuwk": 2, "hupk": 2, "diganta": 2, "dilyar": 2, "buzan": 2, "coelho": 2, "mollo": 2, "diyi": 2, "dong": 2, "ho": 2, "dylan": 2, "schrader": 2, "ekaterina": 2, "shutova": 2, "ekin": 2, "dogu": 2, "cubuk": 2, "elad": 2, "segal": 2, "eleanor": 2, "hagerman": 2, "donowai": 2, "elli": 2, "pavlick": 2, "emanuel": 2, "rodola": 2, "emma": 2, "lam": 2, "chu": 2, "erkut": 2, "erni": 2, "ethan": 2, "dyer": 2, "jerzak": 2, "eunic": 2, "engefu": 2, "manyasi": 2, "evgenii": 2, "zheltonozhskii": 2, "fanyu": 2, "xia": 2, "fatemeh": 2, "siar": 2, "fernando": 2, "mart\u00ednez": 2, "plume": 2, "francesca": 2, "happ\u00e9": 2, "gaurav": 2, "mishra": 2, "genta": 2, "indra": 2, "winata": 2, "gerard": 2, "melo": 2, "germ\u00e1n": 2, "kruszewski": 2, "giambattista": 2, "parascandolo": 2, "giorgio": 2, "mariani": 2, "gloria": 2, "gonzalo": 2, "jaimovitch": 2, "l\u00f3pez": 2, "gregor": 2, "betz": 2, "gui": 2, "gur": 2, "hana": 2, "galijasev": 2, "hannah": 2, "rashkin": 2, "hannaneh": 2, "hajishirzi": 2, "harsh": 2, "mehta": 2, "hayden": 2, "bogar": 2, "henri": 2, "shevlin": 2, "hinrich": 2, "sch\u00fctze": 2, "hiromu": 2, "yakura": 2, "hongm": 2, "hugh": 2, "mee": 2, "wong": 2, "ian": 2, "ng": 2, "isaac": 2, "nobl": 2, "jaap": 2, "jumelet": 2, "jack": 2, "geissing": 2, "jackson": 2, "kernion": 2, "jaehoon": 2, "jaim": 2, "fern\u00e1ndez": 2, "fisac": 2, "jame": 2, "simon": 2, "koppel": 2, "koco\u0144": 2, "jana": 2, "thompson": 2, "janel": 2, "wingfield": 2, "jarema": 2, "radom": 2, "jascha": 2, "sohl": 2, "dickstein": 2, "jason": 2, "phang": 2, "yosinski": 2, "jekaterina": 2, "novikova": 2, "jell": 2, "bosscher": 2, "jennif": 2, "marsh": 2, "jeremi": 2, "jeroen": 2, "taal": 2, "jess": 2, "engel": 2, "jesujoba": 2, "alabi": 2, "jiacheng": 2, "jiam": 2, "jillian": 2, "joan": 2, "waweru": 2, "john": 2, "burden": 2, "miller": 2, "bali": 2, "jonathan": 2, "batcheld": 2, "berant": 2, "j\u00f6rg": 2, "frohberg": 2, "jo": 2, "rozen": 2, "jose": 2, "orallo": 2, "boudeman": 2, "guerr": 2, "jone": 2, "joshua": 2, "tenenbaum": 2, "rule": [2, 3, 4], "joyc": 2, "chua": 2, "kamil": 2, "kanclerz": 2, "karen": 2, "livescu": 2, "karl": 2, "krauth": 2, "karthik": 2, "gopalakrishnan": 2, "katerina": 2, "ignatyeva": 2, "katja": 2, "markert": 2, "kaustubh": 2, "dhole": 2, "kevin": 2, "gimpel": 2, "omondi": 2, "kori": 2, "mathewson": 2, "kristen": 2, "chiafullo": 2, "ksenia": 2, "shkaruta": 2, "shridhar": 2, "kyle": 2, "mcdonel": 2, "richardson": 2, "laria": 2, "reynold": 2, "leo": 2, "gao": 2, "liam": 2, "dugan": 2, "lianhui": 2, "qin": 2, "lidia": 2, "contrera": 2, "ochando": 2, "loui": 2, "morenc": 2, "luca": [2, 4], "moschella": 2, "luci": 2, "ludwig": 2, "schmidt": 2, "luheng": 2, "lui": 2, "olivero": 2, "col\u00f3n": 2, "luke": 2, "metz": 2, "l\u00fctfi": 2, "kerem": 2, "\u015fenel": 2, "maarten": 2, "bosma": 2, "sap": 2, "maartj": 2, "hoev": 2, "maheen": 2, "farooqi": 2, "manaal": 2, "faruqui": 2, "marco": 2, "baturan": 2, "marelli": 2, "maru": 2, "maria": 2, "quintana": 2, "mari": 2, "tolkiehn": 2, "mario": 2, "giulianelli": 2, "martha": 2, "martin": 2, "potthast": 2, "l": 2, "leavitt": 2, "hagen": 2, "m\u00e1ty\u00e1": 2, "schubert": 2, "medina": 2, "orduna": 2, "baitemirova": 2, "melodi": 2, "arnaud": 2, "melvin": 2, "mcelrath": 2, "yee": 2, "cohen": 2, "ivanitskii": 2, "starritt": 2, "strube": 2, "micha\u0142": 2, "sw\u0119drowski": 2, "michel": 2, "bevilacqua": 2, "mihir": 2, "kale": 2, "cain": 2, "mime": 2, "mitch": 2, "walker": 2, "mo": 2, "tiwari": 2, "mohit": 2, "bansal": 2, "moin": 2, "aminnaseri": 2, "mor": 2, "geva": 2, "mozhdeh": 2, "gheini": 2, "mukund": 2, "varma": 2, "nanyun": 2, "peng": 2, "nayeon": 2, "neta": 2, "krakov": 2, "doiron": 2, "nicol": 2, "martinez": 2, "nikita": 2, "nangia": 2, "nikla": 2, "decker": 2, "muennighoff": 2, "nitish": 2, "shirish": 2, "keskar": 2, "niveditha": 2, "noah": 2, "constant": 2, "fiedel": 2, "nuan": 2, "wen": 2, "oliv": 2, "agha": 2, "elbaghdadi": 2, "omer": 2, "moreno": 2, "casar": 2, "parth": 2, "doshi": 2, "pascal": 2, "fung": 2, "paul": 2, "pu": 2, "vicol": 2, "pegah": 2, "alipoormolabashi": 2, "peiyuan": 2, "liao": 2, "eckerslei": 2, "phu": 2, "mon": 2, "htut": 2, "pinyu": 2, "hwang": 2, "piotr": 2, "mi\u0142kowski": 2, "piyush": 2, "patil": 2, "pouya": 2, "pezeshkpour": 2, "priti": 2, "oli": 2, "qiaozhu": 2, "mei": 2, "qing": 2, "lyu": 2, "qinlang": 2, "rabin": 2, "banjad": 2, "rachel": 2, "etta": 2, "rudolph": 2, "raefer": 2, "rahel": 2, "haback": 2, "ramon": 2, "risco": 2, "rapha\u00ebl": 2, "milli\u00e8r": 2, "rhythm": 2, "garg": 2, "rif": 2, "saurou": 2, "riku": 2, "arakawa": 2, "robb": 2, "raymaek": 2, "frank": 2, "rohan": 2, "sikand": 2, "roman": 2, "novak": 2, "sitelew": 2, "ronan": 2, "lebra": 2, "rosann": 2, "rowan": 2, "rui": [2, 4], "ruslan": 2, "salakhutdinov": 2, "stoval": 2, "teehan": 2, "rylan": 2, "sahib": 2, "saif": 2, "sajant": 2, "anand": 2, "dillav": 2, "shleifer": 2, "wiseman": 2, "samuel": 2, "gruetter": 2, "bowman": 2, "schoenholz": 2, "sanghyun": 2, "han": 2, "sanjeev": 2, "kwatra": 2, "sarah": 2, "rou": 2, "sarik": 2, "ghazarian": 2, "sayan": 2, "ghosh": 2, "sean": 2, "casei": 2, "sebastian": 2, "bischoff": 2, "gehrmann": 2, "schuster": 2, "sepideh": 2, "sadeghi": 2, "shadi": 2, "hamdan": 2, "sharon": 2, "zhou": 2, "shashank": 2, "sherri": 2, "shi": 2, "shikhar": 2, "shima": 2, "asaadi": 2, "shixiang": 2, "shane": 2, "shubh": 2, "pachchigar": 2, "shubham": 2, "toshniw": 2, "shyam": 2, "upadhyai": 2, "shyamolima": 2, "debnath": 2, "siamak": 2, "shakeri": 2, "thormey": 2, "melzi": 2, "siva": 2, "reddi": 2, "sneha": 2, "priscilla": 2, "makini": 2, "soo": 2, "hwan": 2, "spencer": 2, "toren": 2, "sriharsha": 2, "hatwar": 2, "stanisla": 2, "dehaen": 2, "stefan": 2, "divic": 2, "stefano": 2, "ermon": 2, "stella": 2, "biderman": 2, "stephen": 2, "prasad": 2, "piantadosi": 2, "stuart": 2, "shieber": 2, "summer": 2, "misherghi": 2, "svetlana": 2, "kiritchenko": 2, "swaroop": 2, "tal": 2, "linzen": 2, "tariq": 2, "tatsu": 2, "te": 2, "th\u00e9o": 2, "desbord": 2, "theodor": 2, "rothschild": 2, "phan": 2, "tiberiu": 2, "nkinyili": 2, "timo": 2, "schick": 2, "timofei": 2, "kornev": 2, "titu": 2, "tunduni": 2, "tobia": 2, "gerstenberg": 2, "trenton": 2, "trishala": 2, "neeraj": 2, "tushar": 2, "khot": 2, "tyler": 2, "shultz": 2, "uri": 2, "shaham": 2, "vera": 2, "demberg": 2, "victoria": 2, "nyamai": 2, "vika": 2, "raunak": 2, "vinai": 2, "ramasesh": 2, "udai": 2, "prabhu": 2, "vishakh": 2, "padmakumar": 2, "vivek": 2, "srikumar": 2, "fedu": 2, "wout": 2, "vossen": 2, "xiaoyu": 2, "tong": 2, "xinran": 2, "zhao": 2, "xinyi": 2, "xudong": 2, "yadollah": 2, "yaghoobzadeh": 2, "yair": 2, "lakretz": 2, "yangqiu": 2, "yasaman": 2, "bahri": 2, "yichi": 2, "yide": 2, "yifu": 2, "yonatan": 2, "belinkov": 2, "hou": 2, "yufang": 2, "yuntao": 2, "bai": 2, "zachari": 2, "seid": 2, "zhuoy": 2, "zijian": 2, "ziji": 2, "j": [2, 4], "zirui": 2, "ziyi": 2, "imit": 2, "game": 2, "extrapol": 2, "2206": 2, "04615": 2, "wpn": 2, "19": 2, "yada": 2, "pruksachatkun": 2, "amanpreet": 2, "julian": 2, "felix": 2, "hill": 2, "stickier": 2, "wsm": 2, "1804": 2, "07461": 2, "wtb": 2, "22": 2, "yi": [2, 4], "tai": 2, "borgeaud": 2, "dani": 2, "yogatama": 2, "denni": 2, "donald": 2, "metzler": 2, "ed": 2, "h": 2, "oriol": 2, "vinyal": 2, "jeff": 2, "dean": 2, "07682": 2, "wdr": 2, "doolei": 2, "manlei": 2, "arka": 2, "pal": 2, "feuer": 2, "siddhartha": 2, "ravid": 2, "shwartz": 2, "ziv": 2, "khalid": 2, "saifullah": 2, "siddartha": 2, "naidu": 2, "chinmai": 2, "hegd": 2, "lecun": 2, "tom": 2, "goldstein": 2, "willi": 2, "neiswang": 2, "micah": 2, "goldblum": 2, "2406": 2, "19314": 2, "yyh": 2, "baosong": 2, "bo": 2, "chengpeng": 2, "chengyuan": 2, "fei": 2, "guant": 2, "haoran": 2, "huan": 2, "jialong": 2, "jialin": 2, "jianhong": 2, "tu": 2, "jianwei": 2, "jianxin": 2, "jin": 2, "jingren": 2, "jinz": 2, "jinzheng": 2, "junyang": 2, "keme": 2, "lu": 2, "keqin": 2, "kexin": 2, "mingfeng": 2, "xue": 2, "na": 2, "ni": 2, "pei": 2, "ru": 2, "men": 2, "ruiz": 2, "runji": 2, "shiji": 2, "sinan": 2, "tan": 2, "tianhang": 2, "tianhao": 2, "wenbin": 2, "ge": 2, "xiaodong": 2, "deng": 2, "xiaohuan": 2, "xingzhang": 2, "xinyu": 2, "xipin": 2, "xuancheng": 2, "fan": 2, "yichang": 2, "wan": 2, "yunfei": 2, "yuqiong": 2, "zhenru": 2, "zhihao": 2, "2407": 2, "10671": 2, "zc": 2, "siyuan": 2, "zhuang": 2, "zhanghao": 2, "yonghao": 2, "zi": 2, "zhuohan": 2, "xing": 2, "2306": 2, "05685": 2, "huggingface24": 2, "12": [2, 3], "06": [2, 4], "metaai24": 2, "promptfoo24": 2, "toolkit": 2, "www": 2, "dev": 2, "go": [3, 4], "far": 3, "possibli": 3, "eliot": 3, "english": 3, "thumb": 3, "\u00be": 3, "max_output_token": 3, "4096": 3, "16384": 3, "contrari": 3, "surpass": 3, "stop": 3, "mid": 3, "truncat": 3, "max_input_token": 3, "input_cost_per_token": 3, "output_cost_per_token": 3, "11b": 3, "v1": 3, "128000": 3, "5e": 3, "sonnet": 3, "20241022": 3, "8192": 3, "200000": 3, "3e": 3, "0613": 3, "6e": 3, "04": 3, "09": 3, "1e": 3, "gemini": 3, "flash": 3, "002": 3, "1048576": 3, "pro": 3, "2097152": 3, "05e": 3, "pose": [3, 4], "incomplet": 3, "extens": [3, 4], "articl": 3, "abruptli": 3, "cut": 3, "disrupt": 3, "shallow": 3, "thorough": 3, "receiv": 3, "partial": 3, "dissatisfact": 3, "frustrat": 3, "educ": 3, "creation": 3, "feasibl": 3, "split": 3, "previou": [3, 4], "10k": 3, "diagram": 3, "charactertextsplitt": 3, "tiktoken": 3, "sequenti": 3, "chain": 3, "newlin": 3, "broadli": [3, 4], "decid": 3, "want": 3, "sure": [3, 4], "lost": 3, "cheap": 3, "speciali": 3, "advantag": [3, 4], "naiv": 3, "period": 3, "nltk": 3, "spaci": 3, "recurs": 3, "divid": 3, "hierarch": 3, "manner": [3, 4], "talk": 3, "theme": 3, "topic": [3, 4], "splitter": 3, "markdown": 3, "html": [3, 4], "get_chunk": 3, "chunk_siz": 3, "chunk_overlap": 3, "langchain_text_splitt": 3, "text_splitt": 3, "from_tiktoken_encod": 3, "split_text": 3, "persona": 3, "assum": 3, "task": [3, 4], "action": 3, "langchain_cor": [3, 4], "prompttempl": 3, "get_base_prompt_templ": 3, "base_prompt": [3, 4], "from_templ": 3, "llmchain": 3, "togeth": 3, "parser": [3, 4], "output_pars": 3, "stroutputpars": 3, "langchain_commun": 3, "chat_model": 3, "chatlitellm": 3, "get_llm_chain": 3, "prompt_templ": [3, 4], "llm_chain": [3, 4], "api_key_label": 3, "upper": 3, "_api_kei": 3, "api_kei": 3, "get_dynamic_prompt_templ": 3, "get_dynamic_prompt_param": 3, "prompt_param": 3, "part_idx": 3, "total_part": 3, "chat_context": 3, "origin": [3, 4], "part": [3, 4], "total": [3, 4], "param": 3, "dynamic_prompt_param": 3, "copi": 3, "elif": 3, "last": [3, 4], "merg": 3, "concaten": 3, "generate_report": 3, "input_cont": 3, "llm_model_nam": 3, "report_part": 3, "num_part": 3, "dinam": 3, "priovid": 3, "enumer": 3, "invok": [3, 4], "cummul": 3, "join": 3, "max_chunk_s": 3, "max_chunk_overlap": 3, "latest": [3, 4], "readabl": 3, "apple_report": 3, "300": 3, "disclos": [3, 4], "state": [3, 4], "luation": 3, "oblig": 3, "cash": 3, "disciplin": 3, "smooth": 3, "upon": 3, "subhead": 3, "adher": [3, 4], "revenu": [3, 4], "segment": [3, 4], "liquid": 3, "capit": [3, 4], "despit": [3, 4], "depth": 3, "overlook": 3, "mitig": [3, 4], "fit": [3, 4], "within": [3, 4], "preserv": 3, "easier": [3, 4], "preprocess": 3, "enhanc": [3, 4], "necessit": 3, "meticul": 3, "retain": 3, "necessari": 3, "seamlessli": 3, "circumv": 3, "therebi": 3, "escal": 3, "frequenc": 3, "volum": 3, "bottleneck": 3, "latenc": 3, "friendli": 3, "mustafa": 3, "suleyman": 3, "infinit": 3, "amount": [3, 4], "fewer": 3, "compress": 3, "progress": 3, "condens": 3, "adjust": [3, 4], "constrain": [3, 4], "collect": 3, "versatil": 3, "drive": [3, 4], "grace": 3, "fallback": 3, "empow": 3, "crucial": [3, 4], "stai": 3, "full": [3, 4], "langchain24": 3, "how_to": 3, "07": [3, 4], "freedom": 4, "thrive": 4, "julia": 4, "easili": 4, "notebook": 4, "overrid": 4, "response_cont": 4, "wow": 4, "lot": 4, "breakdown": 4, "stream": 4, "portfolio": 4, "impress": 4, "notic": 4, "march": 4, "29": 4, "huge": 4, "investor": 4, "figur": 4, "compli": 4, "ye": 4, "date": 4, "serious": 4, "is_json": 4, "myjson": 4, "except": 4, "valueerror": 4, "lack": 4, "googl": 4, "survei": 4, "51": 4, "trial": 4, "elicit": 4, "consum": 4, "wrangl": 4, "conform": 4, "ad": 4, "hoc": 4, "streamlin": 4, "subsequ": 4, "modul": 4, "dataset": 4, "unwant": 4, "neg": 4, "ui": 4, "restrict": 4, "mobil": 4, "devic": 4, "overflow": 4, "overwhelm": 4, "twitter": 4, "youtub": 4, "impos": 4, "publish": 4, "successfulli": 4, "adopt": 4, "emploi": 4, "schema": 4, "blueprint": 4, "nativ": 4, "regular": 4, "json_format": 4, "person1": 4, "q1": 4, "person2": 4, "net": 4, "margin": 4, "materi": 4, "nest": 4, "todai": 4, "programmat": 4, "thellm": 4, "unend": 4, "whitespac": 4, "until": 4, "forget": 4, "throw": 4, "appear": 4, "somewher": 4, "json_object": 4, "628": 4, "553": 4, "sheer": 4, "115": 4, "823": 4, "circul": 4, "plai": 4, "vertex": 4, "releas": 4, "suppli": 4, "so": 4, "worri": 4, "enum": 4, "No": 4, "incorrectli": 4, "refus": 4, "simpler": 4, "strongli": 4, "entiti": 4, "place": 4, "secextract": 4, "mentioned_ent": 4, "mentioned_plac": 4, "extract_from_sec_fil": 4, "sec_filing_text": 4, "hint": 4, "attribut": 4, "prompt_extract": 4, "sec_extract": 4, "nasdaq": 4, "llc": 4, "washington": 4, "cupertino": 4, "usabl": 4, "beg": 4, "with_structured_output": 4, "runnabl": 4, "typeddict": 4, "qu": 4, "langchain_openai": 4, "chatopenai": 4, "chatprompttempl": 4, "extract_from_sec_filing_langchain": 4, "structured_llm": 4, "from_messag": 4, "sec_extraction_langchain": 4, "found": 4, "hood": 4, "logit": 4, "raw": 4, "network": 4, "regex": 4, "strong": 4, "enough": 4, "qwen": 4, "label": 4, "unexpect": 4, "malform": 4, "pass": 4, "sec_extraction_outlin": 4, "zsp": 4, "zicorp": 4, "phenomenon": 4, "popular": 4, "cpp": 4, "gbnf": 4, "ggml": 4, "bnf": 4, "ggerganov": 4, "accomplish": 4, "formal": 4, "backu": 4, "naur": 4, "wikipedia": 4, "contributor": 4, "strictli": 4, "soon": 4, "curl": 4, "fssl": 4, "sh": 4, "did": 4, "extract_entities_from_sec_fil": 4, "suffix": 4, "ollama_structured_output_prompt_suffix": 4, "ollama_structured_output_temperatur": 4, "mistral": 4, "llama2": 4, "uncensor": 4, "model_json_schema": 4, "response_json": 4, "AND": 4, "wrapper": 4, "exllama2": 4, "mlx": 4, "lm": 4, "enterpris": 4, "commerci": 4, "medium": 4, "low": 4, "done": 4, "know": 4, "chanc": 4, "connect": 4, "encourag": 4, "correctli": 4, "area": 4, "mix": 4, "famili": 4, "furthermor": 4, "nonetheless": 4, "evid": 4, "studi": 4, "wrap": 4, "map": 4, "gemma": 4, "uncov": 4, "wors": 4, "extran": 4, "dispar": 4, "preval": 4, "outdat": 4, "rapidli": 4, "fashion": 4, "remark": 4, "me": 4, "speak": 4, "freeli": 4, "aider": 4, "decod": 4, "hinder": 4, "outweigh": 4, "team": 4, "rebutt": 4, "argu": 4, "v": 4, "compel": 4, "reproduct": 4, "paint": 4, "pictur": 4, "publicli": 4, "independ": 4, "verif": 4, "dottxt": 4, "flaw": 4, "believ": 4, "led": 4, "inaccur": 4, "reconcil": 4, "uneven": 4, "didn": 4, "conflat": 4, "argument": 4, "drawback": 4, "unlock": 4, "wider": 4, "thank": 4, "pfiffer": 4, "hi": 4, "aid24": 4, "dot24": 4, "sai": 4, "demo": 4, "tree": 4, "gge24": 4, "blob": 4, "readm": 4, "llf": 4, "xieyang": 4, "frederick": 4, "fiannaca": 4, "terri": 4, "koo": 4, "dixon": 4, "carri": 4, "cai": 4, "ea": 4, "york": 4, "ny": 4, "usa": 4, "machineri": 4, "doi": 4, "1145": 4, "3613905": 4, "3650756": 4, "ln": 4, "xuan": 4, "hai": 4, "nguyen": 4, "ngoc": 4, "tiviati": 4, "sim": 4, "hieu": 4, "dao": 4, "shafiq": 4, "joti": 4, "kenji": 4, "kawaguchi": 4, "nanci": 4, "min": 4, "yen": 4, "kan": 4, "2408": 4, "08656": 4, "out24": 4, "io": 4, "twt": 4, "zhi": 4, "cheng": 4, "kuang": 4, "tsai": 4, "chieh": 4, "hung": 4, "yun": 4, "nung": 4, "02442": 4, "wikipediacontributors24": 4, "wiktionari": 4, "naur_form": 4}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"introduct": [0, 1, 4], "content": [0, 2, 3, 4], "core": 0, "challeng": 0, "we": 0, "ll": 0, "address": 0, "A": [0, 1], "practic": [0, 1, 4], "approach": 0, "note": 0, "perspect": 0, "who": 0, "thi": 0, "book": 0, "i": 0, "For": 0, "outcom": 0, "prerequisit": 0, "set": 0, "up": 0, "your": 0, "environ": 0, "python": 0, "setup": 0, "api": [0, 4], "kei": [0, 2, 3], "configur": 0, "code": 0, "repositori": 0, "troubleshoot": 0, "common": 0, "issu": 0, "about": 0, "author": 0, "": 0, "tame": 1, "llm": [1, 2], "guid": 1, "pitfal": 1, "open": 1, "sourc": 1, "softwar": [1, 2], "chapter": 1, "1": [1, 3], "2": [1, 3], "wrestl": [1, 4], "structur": [1, 4], "output": [1, 3, 4], "3": [1, 3], "input": 1, "size": [1, 3], "length": [1, 3], "limit": [1, 3], "4": [1, 3], "5": 1, "The": [1, 2], "eval": [1, 2], "gap": [1, 2], "6": 1, "hallucin": 1, "realiti": 1, "7": 1, "safeti": 1, "concern": 1, "8": 1, "cost": [1, 3], "factor": 1, "9": 1, "break": 1, "free": 1, "from": 1, "cloud": 1, "provid": [1, 4], "appendix": 1, "tool": [1, 2, 4], "resourc": 1, "non": 2, "determinist": 2, "gener": [2, 3], "machin": 2, "temperatur": 2, "sampl": 2, "spectrum": 2, "emerg": 2, "properti": 2, "problem": [2, 3, 4], "statement": [2, 3, 4], "tradit": 2, "v": 2, "design": 2, "applic": 2, "test": 2, "requir": 2, "matrix": 2, "conceptu": 2, "overview": 2, "consider": [2, 3], "metric": 2, "evalu": 2, "task": 2, "model": [2, 3], "base": [2, 3], "human": 2, "benchmark": 2, "leaderboard": 2, "lightev": 2, "mmlu": 2, "econometr": 2, "dataset": 2, "famili": 2, "us": 2, "langchain": [2, 4], "promptfoo": 2, "refer": [2, 3, 4], "what": 3, "ar": 3, "token": 3, "comparison": [3, 4], "across": 3, "chunk": 3, "contextu": 3, "link": 3, "long": 3, "form": 3, "step": 3, "write": 3, "prompt": [3, 4], "templat": 3, "construct": 3, "dynam": 3, "paramet": 3, "report": 3, "exampl": 3, "usag": 3, "discuss": [3, 4], "implic": 3, "futur": 3, "conclus": [3, 4], "user": 4, "need": 4, "solut": 4, "strategi": 4, "techniqu": 4, "One": 4, "shot": 4, "specif": 4, "json": 4, "mode": 4, "outlin": 4, "ollama": 4, "compar": 4, "framework": 4, "best": 4, "research": 4, "ongo": 4, "debat": 4, "acknowledg": 4}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinxcontrib.bibtex": 9, "sphinx": 57}, "alltitles": {"Introduction": [[0, "introduction"], [4, "introduction"]], "Contents": [[0, "contents"], [2, "contents"], [3, "contents"], [4, "contents"]], "Core Challenges We\u2019ll Address": [[0, "core-challenges-we-ll-address"]], "A Practical Approach": [[0, "a-practical-approach"]], "A Note on Perspective": [[0, "a-note-on-perspective"]], "Who This Book Is For": [[0, "who-this-book-is-for"]], "Outcomes": [[0, "outcomes"]], "Prerequisites": [[0, "prerequisites"]], "Setting Up Your Environment": [[0, "setting-up-your-environment"]], "Python Environment Setup": [[0, "python-environment-setup"]], "API Keys Configuration": [[0, "api-keys-configuration"]], "Code Repository": [[0, "code-repository"]], "Troubleshooting Common Issues": [[0, "troubleshooting-common-issues"]], "About the Author(s)": [[0, "about-the-author-s"]], "Taming LLMs": [[1, "taming-llms"]], "A Practical Guide to LLM Pitfalls with Open Source Software": [[1, "a-practical-guide-to-llm-pitfalls-with-open-source-software"]], "Chapter 1: Introduction": [[1, "chapter-1-introduction"]], "Chapter 2: Wrestling with Structured Output": [[1, "chapter-2-wrestling-with-structured-output"]], "Chapter 3: Input Size and Length Limitations": [[1, "chapter-3-input-size-and-length-limitations"]], "Chapter 4: Output Size and Length Limitations": [[1, "chapter-4-output-size-and-length-limitations"]], "Chapter 5: The Evals Gap": [[1, "chapter-5-the-evals-gap"]], "Chapter 6: Hallucination: The Reality Gap": [[1, "chapter-6-hallucination-the-reality-gap"]], "Chapter 7: Safety Concerns": [[1, "chapter-7-safety-concerns"]], "Chapter 8: The Cost Factor": [[1, "chapter-8-the-cost-factor"]], "Chapter 9: Breaking Free from Cloud Providers": [[1, "chapter-9-breaking-free-from-cloud-providers"]], "Appendix A: Tools and Resources": [[1, "appendix-a-tools-and-resources"]], "The Evals Gap": [[2, "the-evals-gap"]], "Non-Deterministic Generative Machines": [[2, "non-deterministic-generative-machines"]], "Temperature and Sampling": [[2, "temperature-and-sampling"]], "The Temperature Spectrum": [[2, "the-temperature-spectrum"]], "Emerging Properties": [[2, "emerging-properties"]], "Problem Statement": [[2, "problem-statement"], [3, "problem-statement"], [4, "problem-statement"]], "Evals of Traditional Software vs LLMs": [[2, "evals-table"]], "Evals Design": [[2, "evals-design"]], "LLM Application Testing Requirements Matrix": [[2, "validation-requirements"]], "Conceptual Overview": [[2, "conceptual-overview"]], "Design Considerations": [[2, "design-considerations"]], "Metrics": [[2, "metrics"]], "Key Metrics for Evaluating Generative Tasks": [[2, "key-metrics"]], "Evaluators": [[2, "evaluators"]], "Model-Based Evaluation": [[2, "model-based-evaluation"]], "Human-Based Evaluation": [[2, "human-based-evaluation"]], "Evaluating Evaluators": [[2, "evaluating-evaluators"]], "Benchmarks and Leaderboards": [[2, "benchmarks-and-leaderboards"]], "Tools": [[2, "tools"]], "LightEval": [[2, "lighteval"]], "MMLU Econometrics Task Dataset sample": [[2, "mmlu-econometrics"]], "Model Families Evaluated Using LightEval": [[2, "model-families"]], "LangChain": [[2, "langchain"], [4, "langchain"]], "PromptFoo": [[2, "promptfoo"]], "References": [[2, "references"], [3, "references"], [4, "references"]], "Output Size Limitations": [[3, "output-size-limitations"]], "What are Token Limits?": [[3, "what-are-token-limits"]], "Token Cost and Length Limitation Comparison Across Key Models": [[3, "token-cost-table"]], "Content Chunking with Contextual Linking": [[3, "content-chunking-with-contextual-linking"]], "Generating long-form content": [[3, "generating-long-form-content"]], "Step 1: Chunking the Content": [[3, "step-1-chunking-the-content"]], "Step 2: Writing the Base Prompt Template": [[3, "step-2-writing-the-base-prompt-template"]], "Step 3: Constructing Dynamic Prompt Parameters": [[3, "step-3-constructing-dynamic-prompt-parameters"]], "Step 4: Generating the Report": [[3, "step-4-generating-the-report"]], "Example Usage": [[3, "example-usage"]], "Discussion": [[3, "discussion"], [4, "discussion"]], "Implications": [[3, "implications"]], "Future Considerations": [[3, "future-considerations"]], "Conclusion": [[3, "conclusion"], [4, "conclusion"]], "Wrestling with Structured Output": [[4, "wrestling-with-structured-output"]], "User Needs": [[4, "user-needs"]], "Solutions": [[4, "solutions"]], "Strategies": [[4, "strategies"]], "Techniques and Tools": [[4, "techniques-and-tools"]], "One-Shot Prompts": [[4, "one-shot-prompts"]], "Structured Output with Provider-Specific APIs": [[4, "structured-output-with-provider-specific-apis"]], "JSON Mode": [[4, "json-mode"]], "Outlines": [[4, "outlines"]], "Ollama": [[4, "ollama"]], "Comparing Solutions": [[4, "comparing-solutions"]], "Structured Output Frameworks Comparison": [[4, "structured-output-frameworks"]], "Best Practices": [[4, "best-practices"]], "Research and Ongoing Debate": [[4, "research-and-ongoing-debate"]], "Acknowledgements": [[4, "acknowledgements"]]}, "indexentries": {}}) \ No newline at end of file +Search.setIndex({"docnames": ["markdown/intro", "markdown/toc", "notebooks/evals", "notebooks/output_size_limit", "notebooks/structured_output"], "filenames": ["markdown/intro.md", "markdown/toc.md", "notebooks/evals.ipynb", "notebooks/output_size_limit.ipynb", "notebooks/structured_output.ipynb"], "titles": ["1. Introduction", "Taming LLMs", "4. The Evals Gap", "2. Output Size Limitations", "3. Wrestling with Structured Output"], "terms": {"am": 0, "alwai": [0, 2, 4], "do": [0, 2, 3, 4], "which": [0, 2, 3, 4], "cannot": [0, 2], "order": [0, 2, 4], "mai": [0, 2, 3, 4], "learn": [0, 2], "how": [0, 2, 3, 4], "pablo": [0, 2], "picasso": 0, "In": [0, 2, 3, 4], "recent": [0, 2, 4], "year": [0, 2, 3, 4], "larg": [0, 1, 2, 3, 4], "languag": [0, 1, 2, 3, 4], "model": [0, 1, 4], "llm": [0, 3, 4], "have": [0, 2, 3, 4], "emerg": [0, 1, 4], "transform": [0, 2, 4], "forc": [0, 2, 4], "technologi": [0, 2, 3, 4], "promis": 0, "revolution": 0, "build": [0, 1, 2, 3, 4], "product": [0, 1, 2, 4], "interact": [0, 2, 3, 4], "comput": [0, 2, 3, 4], "from": [0, 2, 3, 4], "chatgpt": [0, 4], "github": [0, 2, 4], "copilot": 0, "claud": [0, 2, 3], "artifact": 0, "system": [0, 2, 3, 4], "captur": [0, 2], "public": [0, 2], "imagin": 0, "spark": 0, "gold": [0, 2], "rush": 0, "ai": [0, 2, 4], "power": [0, 1, 2, 3, 4], "applic": [0, 1, 3, 4], "howev": [0, 2, 3, 4], "beneath": 0, "surfac": [0, 2], "technolog": 0, "revolut": 0, "li": [0, 2], "complex": [0, 2, 3, 4], "landscap": [0, 2], "practition": [0, 2], "must": [0, 2, 3], "navig": [0, 1], "focus": [0, 2, 3, 4], "bring": 0, "awar": [0, 2, 3], "limit": [0, 2, 4], "har": [0, 1, 3], "open": [0, 2, 3, 4], "sourc": [0, 2, 4], "solut": [0, 1, 2, 3], "overcom": [0, 3], "them": [0, 2, 3, 4], "robust": [0, 2, 3, 4], "It": [0, 2, 3, 4], "offer": [0, 2, 3, 4], "critic": [0, 1, 2, 3, 4], "implement": [0, 1, 2, 3, 4], "back": [0, 4], "reproduc": [0, 1, 2], "exampl": [0, 1, 2, 4], "while": [0, 1, 2, 3, 4], "mani": [0, 3, 4], "resourc": [0, 2, 3], "cover": [0, 2, 3], "capabl": [0, 1, 2, 3, 4], "specif": [0, 1, 2, 3], "hidden": 0, "pitfal": 0, "engin": [0, 1, 2, 4], "technic": [0, 1, 2, 3, 4], "manag": [0, 1, 2, 3], "face": [0, 2, 4], "when": [0, 1, 2, 3, 4], "comprehens": [0, 1, 2, 3, 4], "guid": [0, 2, 4], "leverag": [0, 2, 3, 4], "battl": [0, 1], "test": [0, 1, 4], "tool": [0, 3], "throughout": [0, 3, 4], "tackl": [0, 2], "follow": [0, 2, 3, 4], "non": [0, 1, 4], "exhaust": 0, "list": [0, 2, 3, 4], "structur": [0, 2, 3], "un": 0, "reliabl": [0, 2, 4], "struggl": [0, 2, 4], "maintain": [0, 2, 3, 4], "consist": [0, 2, 3, 4], "output": [0, 2], "format": [0, 2, 3, 4], "complic": 0, "integr": [0, 2, 4], "larger": [0, 2, 3, 4], "make": [0, 2, 3, 4], "error": [0, 2, 4], "handl": [0, 1, 2, 3, 4], "more": [0, 2, 3, 4], "size": [0, 2, 4], "length": [0, 2, 4], "constraint": [0, 1, 3, 4], "strict": [0, 4], "token": [0, 1, 2, 4], "both": [0, 2], "input": [0, 2, 3, 4], "requir": [0, 3, 4], "care": [0, 2, 4], "chunk": [0, 1], "strategi": [0, 1, 2, 3], "long": [0, 1, 2, 4], "form": [0, 1, 2, 4], "effect": [0, 2, 3, 4], "tradit": 0, "softwar": [0, 4], "methodologi": [0, 2, 4], "break": [0, 2, 3], "down": [0, 2, 3], "deal": 0, "determinist": [0, 1, 4], "gener": [0, 1, 4], "new": [0, 2, 3, 4], "hallucin": [0, 2, 4], "These": [0, 2, 3, 4], "can": [0, 2, 3, 4], "plausibl": 0, "sound": 0, "entir": [0, 2, 3], "fabric": [0, 2], "inform": [0, 2, 3, 4], "creat": [0, 2, 3, 4], "signific": [0, 2, 3, 4], "risk": [0, 2, 3], "safeti": [0, 2, 4], "secur": [0, 2, 3, 4], "harm": [0, 2], "bias": [0, 2, 4], "inappropri": 0, "safeguard": [0, 2], "monitor": [0, 1], "ensur": [0, 2, 3, 4], "safe": [0, 4], "deploy": [0, 1, 2, 4], "cost": [0, 2, 4], "optim": [0, 1, 2, 3], "The": [0, 3, 4], "financi": [0, 2, 3, 4], "oper": [0, 2, 3], "base": [0, 1, 4], "quickli": [0, 3], "becom": [0, 2], "prohibit": 0, "without": [0, 2, 3, 4], "observ": [0, 2, 4], "vendor": [0, 1, 2], "lock": [0, 1], "cloud": [0, 2, 4], "provid": [0, 2, 3], "depend": [0, 2, 4], "through": [0, 1, 2, 3, 4], "proprietari": [0, 4], "infrastructur": 0, "difficult": [0, 2], "switch": 0, "self": [0, 1, 2], "host": [0, 1, 2], "take": [0, 1, 2, 3, 4], "hand": [0, 3, 4], "concret": [0, 1], "you": [0, 2, 3, 4], "run": [0, 2, 4], "modifi": 0, "real": [0, 2, 3, 4], "world": [0, 2, 4], "scenario": [0, 2, 4], "best": [0, 1, 2], "techniqu": [0, 1, 2, 3], "pattern": [0, 1, 2, 4], "anti": 0, "look": [0, 1, 2], "our": [0, 2, 3, 4], "goal": [0, 2, 3], "discourag": 0, "us": [0, 3, 4], "enabl": [0, 2, 3, 4], "By": [0, 1, 2, 3, 4], "understand": [0, 1, 2, 3, 4], "upfront": [0, 1], "better": [0, 1, 2, 3], "equip": [0, 1], "avoid": [0, 2, 4], "current": [0, 1, 2, 3, 4], "discours": [0, 1], "around": [0, 1, 2, 3, 4], "tend": [0, 1, 2], "toward": [0, 2, 4], "extrem": 0, "either": [0, 2, 3], "uncrit": 0, "enthusiasm": 0, "wholesal": 0, "dismiss": 0, "differ": [0, 2, 3, 4], "focu": [0, 1, 2, 3, 4], "rather": [0, 2], "than": [0, 2], "theoret": 0, "examin": [0, 3, 4], "first": [0, 2, 3, 4], "everi": 0, "concept": [0, 2], "illustr": [0, 2, 3], "execut": [0, 2], "immedi": 0, "analysi": [0, 1, 2, 3], "balanc": [0, 2, 3, 4], "help": [0, 2, 3, 4], "reader": [0, 1], "decis": [0, 2, 4], "intend": 0, "develop": [0, 2, 3, 4], "step": [0, 1, 2, 4], "insight": [0, 2, 3, 4], "along": [0, 2], "guidanc": [0, 4], "framework": [0, 2], "could": [0, 2, 3, 4], "derail": 0, "project": 0, "earli": [0, 4], "befor": [0, 2, 4], "thei": [0, 2, 3, 4], "costli": 0, "problem": [0, 1], "too": [0, 2, 3], "late": 0, "lifecycl": 0, "design": [0, 1, 3, 4], "lead": [0, 2, 3, 4], "genai": 0, "initi": [0, 2, 3], "leader": 0, "architectur": [0, 2, 3], "advoc": 0, "anyon": 0, "seek": [0, 2], "work": [0, 1, 2, 3, 4], "typic": [0, 2, 3], "job": 0, "role": [0, 2, 3, 4], "platform": [0, 2, 3, 4], "backend": [0, 2], "exist": [0, 2], "ml": 0, "transit": [0, 2, 3], "overse": 0, "motiv": [0, 2, 4], "need": [0, 2, 3], "readi": [0, 2], "desir": [0, 2, 4], "perform": [0, 1, 2, 3, 4], "after": [0, 2, 3], "read": [0, 2, 3, 4], "implic": [0, 1, 2], "experi": [0, 2, 3, 4], "recommend": [0, 2, 3, 4], "abl": [0, 3, 4], "deploi": [0, 3], "proper": [0, 4], "realist": 0, "effort": [0, 2, 4], "estim": [0, 2], "impact": [0, 2, 3, 4], "timelin": 0, "To": [0, 2, 3, 4], "most": [0, 2, 3, 4], "should": [0, 2, 3, 4], "basic": [0, 2, 3], "program": [0, 2], "knowledg": [0, 2], "introductori": [0, 1], "langchain": [0, 1, 3], "e": [0, 2, 3, 4], "g": [0, 2, 3, 4], "chat": [0, 2, 3, 4], "prompt": [0, 1, 2], "templat": [0, 1, 2], "access": [0, 2, 3, 4], "openai": [0, 2, 4], "anthrop": [0, 4], "similar": [0, 2, 4], "grade": 0, "dive": 0, "here": [0, 2, 3, 4], "get": [0, 2, 3, 4], "start": [0, 2, 4], "activ": [0, 2], "virtual": [0, 2], "m": [0, 2], "venv": 0, "env": [0, 2, 3, 4], "bin": 0, "On": [0, 4], "window": [0, 1], "script": 0, "instal": [0, 2, 4], "packag": 0, "pip": [0, 2, 4], "r": [0, 2, 3, 4], "txt": [0, 2, 3, 4], "file": [0, 2, 3, 4], "root": 0, "directori": [0, 2], "add": [0, 3], "other": [0, 2, 3, 4], "sensit": [0, 2], "openai_api_kei": 0, "your_openai_api_key_her": 0, "never": 0, "share": [0, 2, 4], "commit": [0, 2], "version": [0, 2, 4], "control": [0, 2, 4], "contain": [0, 2, 3, 4], "kept": [0, 2], "privat": [0, 2], "clone": 0, "companion": 0, "git": 0, "http": [0, 2, 3, 4], "com": [0, 2, 3, 4], "souzatharsi": 0, "tamingllm": 0, "cd": 0, "If": [0, 2, 4], "encount": [0, 1, 2], "rate": [0, 2], "consid": [0, 2, 3, 4], "smaller": [0, 2, 3, 4], "retri": [0, 4], "logic": [0, 2, 3], "conflict": 0, "try": [0, 2, 4], "fresh": 0, "like": [0, 2, 3, 4], "poetri": 0, "check": [0, 2], "page": 0, "known": [0, 2, 4], "now": [0, 2, 3, 4], "let": [0, 2, 3, 4], "begin": [0, 2], "explor": [0, 2, 4], "dr": 0, "tharsi": 0, "souza": 0, "scientist": 0, "special": [0, 2, 4], "he": [0, 2], "lectur": 0, "columbia": 0, "univers": 0, "master": [0, 4], "scienc": [0, 2], "appli": [0, 2, 3], "analyt": 0, "head": [0, 3], "equiti": 0, "citadel": 0, "former": [0, 2], "senior": 0, "vp": 0, "two": [0, 2, 3, 4], "sigma": 0, "invest": [0, 2, 4], "With": [0, 2], "over": [0, 1, 2, 3, 4], "15": [0, 2, 4], "deliv": 0, "across": [0, 2, 4], "startup": 0, "fortun": 0, "500": [0, 2], "compani": [0, 2, 3, 4], "global": 0, "also": [0, 2, 3, 4], "an": [0, 1, 2, 3, 4], "numer": [0, 2], "scholarli": 0, "frequent": [0, 4], "speaker": 0, "academ": [0, 2], "busi": [0, 2], "confer": [0, 4], "ground": [0, 1, 2], "background": [0, 3], "draw": [0, 2, 4], "scale": [0, 2, 4], "stage": 0, "major": [0, 2, 4], "institut": 0, "well": [0, 2, 4], "advis": 0, "profit": [0, 2, 3, 4], "organ": [0, 2, 3], "contribut": [0, 2, 3], "uniqu": [0, 2], "bridg": 0, "gap": 0, "between": [0, 2, 3, 4], "potenti": [0, 2, 3, 4], "next": [0, 2, 4], "hold": 0, "ph": 0, "d": [0, 2, 4], "ucl": 0, "london": 0, "phil": 0, "sc": 0, "b": [0, 2, 4], "abstract": [1, 2, 4], "heavili": [1, 2, 4], "gloss": 1, "fundament": [1, 2, 4], "challeng": [1, 2, 3, 4], "convers": [1, 2, 3, 4], "thi": [1, 2, 3, 4], "book": 1, "kei": [1, 4], "python": [1, 2, 3, 4], "proven": 1, "yet": [1, 2, 3], "i": [1, 2, 3, 4], "unstructur": [1, 4], "context": [1, 2, 3, 4], "code": [1, 2, 4], "sidestep": 1, "inher": [1, 2, 3, 4], "core": [1, 2], "we": [1, 2, 3, 4], "ll": [1, 2], "address": [1, 2, 3, 4], "approach": [1, 2, 3, 4], "note": [1, 2, 3, 4], "perspect": 1, "who": [1, 2, 3, 4], "For": [1, 2, 3, 4], "outcom": [1, 2, 4], "prerequisit": 1, "set": [1, 2, 3, 4], "up": [1, 2, 3, 4], "your": [1, 2, 3, 4], "environ": [1, 2, 3, 4], "setup": [1, 2, 4], "api": [1, 2], "configur": [1, 2], "repositori": [1, 2], "troubleshoot": 1, "common": [1, 2, 3, 4], "issu": [1, 2, 3, 4], "about": [1, 2, 3, 4], "author": [1, 4], "": [1, 2, 3, 4], "statement": 1, "One": [1, 2], "shot": [1, 2], "json": [1, 2, 3], "mode": 1, "outlin": [1, 2], "multipl": [1, 2, 3, 4], "choic": [1, 2, 4], "pydant": [1, 2, 4], "discuss": [1, 2], "compar": [1, 2, 3], "research": [1, 2, 3], "ongo": [1, 2], "debat": 1, "conclus": [1, 2], "acknowledg": [1, 2], "refer": 1, "content": 1, "what": [1, 2, 4], "ar": [1, 2, 4], "contextu": [1, 2], "link": 1, "write": [1, 2, 4], "construct": [1, 2, 4], "dynam": [1, 2], "paramet": [1, 2, 4], "report": [1, 2, 4], "usag": [1, 2, 4], "futur": [1, 2], "consider": [1, 4], "machin": 1, "temperatur": [1, 3, 4], "sampl": [1, 3, 4], "spectrum": 1, "properti": 1, "conceptu": [1, 4], "overview": [1, 4], "compon": [1, 2], "metric": 1, "evalu": [1, 3, 4], "human": [1, 3, 4], "benchmark": 1, "leaderboard": 1, "type": [1, 2, 3, 4], "detect": [1, 2, 4], "retriev": [1, 2], "augment": [1, 2], "rag": 1, "select": [1, 2], "index": [1, 2, 3], "vector": 1, "store": [1, 2, 3], "method": [1, 2, 3, 4], "pipelin": [1, 2, 4], "valid": [1, 2, 4], "guard": 1, "filter": [1, 2], "sanit": 1, "alert": 1, "cach": [1, 2], "invalid": [1, 4], "predict": [1, 2, 4], "llama": [1, 2, 4], "llamafil": 1, "ollama": 1, "migrat": 1, "commun": [1, 2, 4], "doesn": [2, 3, 4], "t": [2, 3, 4], "matter": 2, "beauti": 2, "theori": 2, "smart": 2, "agre": 2, "wrong": 2, "richard": 2, "feynman": 2, "natur": [2, 3, 4], "unlik": 2, "where": [2, 3, 4], "same": [2, 3, 4], "produc": [2, 4], "novel": 2, "text": [2, 3, 4], "train": [2, 4], "data": [2, 3, 4], "respons": [2, 3, 4], "each": [2, 3], "time": [2, 3, 4], "re": [2, 3, 4], "queri": 2, "even": [2, 3, 4], "ident": 2, "behavior": 2, "strength": 2, "ask": [2, 4], "question": [2, 4], "isn": 2, "bug": 2, "featur": [2, 4], "random": 2, "allow": [2, 3, 4], "creativ": [2, 4], "divers": [2, 3, 4], "testabl": 2, "servic": [2, 3, 4], "advic": 2, "mean": [2, 3, 4], "yield": 2, "exceedingli": 2, "regulatori": 2, "complianc": [2, 4], "guarante": [2, 4], "user": [2, 3], "trust": [2, 4], "affect": 2, "inconsist": [2, 4], "primari": 2, "determin": [2, 3, 4], "come": [2, 3, 4], "dure": [2, 4], "calcul": 2, "probabl": [2, 4], "distribut": [2, 4], "nucleu": 2, "holtzman": 2, "et": [2, 4], "al": [2, 4], "2020": 2, "top": [2, 4], "k": [2, 3, 4], "coher": [2, 3], "0": [2, 3, 4], "repetit": [2, 3, 4], "1": [2, 4], "increas": [2, 3, 4], "incoher": 2, "dotenv": [2, 3, 4], "import": [2, 3, 4], "load_dotenv": [2, 3, 4], "o": [2, 3, 4], "load": [2, 3, 4], "variabl": [2, 3, 4], "panda": 2, "pd": 2, "def": [2, 3, 4], "generate_respons": 2, "model_nam": [2, 3], "str": [2, 3, 4], "float": [2, 3], "attempt": [2, 3], "int": [2, 3], "3": [2, 4], "datafram": 2, "demonstr": [2, 3, 4], "client": [2, 4], "result": [2, 3, 4], "temp": 2, "rang": [2, 3, 4], "complet": [2, 3, 4], "messag": [2, 4], "max_token": 2, "50": 2, "append": [2, 3, 4], "displai": [2, 4], "group": [2, 3], "df_result": 2, "print": [2, 3, 4], "f": [2, 3, 4], "ntemperatur": 2, "40": 2, "temp_respons": 2, "_": 2, "row": 2, "iterrow": 2, "return": [2, 3, 4], "max_length": [2, 4], "10000": [2, 3, 4], "appl": [2, 3, 4], "sec_fil": [2, 4], "gpt": [2, 3, 4], "5": [2, 3, 4], "turbo": [2, 3, 4], "singl": [2, 3, 4], "summari": [2, 4], "2": [2, 4], "inc": [2, 3, 4], "its": [2, 3, 4], "10": [2, 3, 4], "fiscal": [2, 3], "end": [2, 3], "septemb": [2, 3], "28": [2, 3], "2024": [2, 3, 4], "sec": [2, 3, 4], "detail": [2, 3, 4], "season": 2, "issuer": 2, "california": [2, 4], "manufactur": 2, "market": [2, 3, 4], "smartphon": 2, "person": [2, 4], "tablet": 2, "wearabl": [2, 4], "accessori": 2, "innov": [2, 3], "condit": 2, "exchang": [2, 3, 4], "commiss": [2, 3, 4], "factor": [2, 3, 4], "invdestacksmeticsisdict": 2, "setispect": 2, "20cyan": 2, "evaluationseld": 2, "anvis": 2, "droitent": 2, "discernminerv": 2, "versbobprefvers": 2, "vo\u8be5": 2, "option\u548c": 2, "meio": 2, "forecast": 2, "\u0432\u0440\u0435\u043ccisco": 2, "dellaischenpoihscap": 2, "geme": 2, "gettim": 2, "simpl": [2, 3, 4], "dramat": [2, 4], "alter": 2, "wai": [2, 3, 4], "systemat": [2, 4], "assess": [2, 3], "At": 2, "rigid": 2, "vari": 2, "less": 2, "wildli": 2, "often": [2, 3, 4], "inadequ": 2, "one": [2, 3, 4], "radic": 2, "reli": 2, "u": [2, 4], "grappl": 2, "probabilist": 2, "lower": [2, 4], "seem": [2, 4], "safer": 2, "don": [2, 3, 4], "necessarili": 2, "elimin": 2, "underli": [2, 4], "uncertainti": 2, "highlight": [2, 3, 4], "paradigm": 2, "aspect": [2, 3, 4], "beyond": 2, "present": [2, 3, 4], "anoth": 2, "fascin": 2, "abil": [2, 4], "spontan": 2, "aris": 2, "answer": [2, 3, 4], "reason": [2, 3, 4], "aren": 2, "explicitli": 2, "grow": [2, 4], "against": 2, "clear": [2, 4], "wei": 2, "2022": 2, "fig": [2, 3, 4], "4": 2, "relationship": 2, "linear": 2, "below": [2, 3, 4], "certain": [2, 3, 4], "threshold": 2, "absent": 2, "simpli": [2, 3, 4], "much": 2, "coax": 2, "out": [2, 3], "onc": [2, 3], "reach": [2, 3, 4], "point": [2, 3], "journei": 2, "suddenli": 2, "manifest": 2, "call": [2, 3, 4], "phase": 2, "shift": 2, "inabl": 2, "unpredict": [2, 4], "stand": 2, "stark": 2, "contrast": 2, "deliber": 2, "press": 2, "convent": 2, "stabl": 2, "suit": 2, "defin": [2, 3, 4], "accept": 2, "criteria": 2, "contend": 2, "constantli": 2, "7b": 2, "70b": 2, "ha": [2, 4], "rethink": 2, "practic": [2, 3], "math": 2, "tutor": 2, "children": 2, "would": [2, 3, 4], "verifi": [2, 4], "function": [2, 3, 4], "But": [2, 4], "just": [2, 3, 4], "predefin": [2, 4], "adapt": [2, 3], "explan": [2, 4], "child": 2, "level": [2, 3, 4], "engag": [2, 4], "ag": 2, "appropri": [2, 3, 4], "bound": 2, "rais": [2, 3], "measur": 2, "weren": 2, "evolv": [2, 3], "accuraci": [2, 4], "subject": 2, "qualiti": [2, 3, 4], "kind": 2, "There": [2, 3, 4], "account": 2, "tabl": [2, 3, 4], "sever": [2, 3, 4], "dimens": 2, "pre": 2, "extend": [2, 4], "explicit": [2, 4], "usual": 2, "precis": [2, 4], "involv": [2, 4], "resist": 2, "straightforward": [2, 3, 4], "quantif": 2, "score": [2, 4], "judgment": 2, "remain": [2, 3], "contamin": 2, "carefulli": [2, 4], "craft": [2, 4], "case": [2, 3, 4], "expect": [2, 3, 4], "unit": [2, 3, 4], "massiv": 2, "internet": 2, "alreadi": 2, "seen": 2, "memor": 2, "artifici": 2, "inflat": 2, "curat": 2, "truli": 2, "unseen": 2, "rigor": 2, "cross": 2, "evolut": 2, "continu": [2, 3, 4], "advanc": [2, 3, 4], "longitudin": 2, "comparison": 2, "obsolet": 2, "older": 2, "autom": [2, 4], "demand": [2, 4], "oversight": 2, "annot": 2, "review": [2, 4], "process": [2, 3, 4], "mostli": [2, 4], "distinct": 2, "versu": 2, "latter": 2, "foundat": [2, 3], "purpos": [2, 4], "tailor": 2, "particular": [2, 4], "combin": [2, 3, 4], "associ": [2, 3, 4], "solv": [2, 4], "That": [2, 4], "differenti": 2, "becaus": 2, "chang": 2, "scope": [2, 3], "includ": [2, 3, 4], "thing": [2, 4], "instead": [2, 3, 4], "meet": [2, 4], "align": [2, 3, 4], "object": [2, 4], "A": [2, 3, 4], "great": [2, 4], "categori": 2, "why": [2, 4], "misinform": 2, "prevent": [2, 4], "factual": 2, "databas": [2, 4], "citat": 2, "tempor": 2, "scientif": 2, "fals": [2, 4], "reduc": [2, 3, 4], "legal": 2, "reput": 2, "support": [2, 4], "protect": 2, "manipul": 2, "unqualifi": 2, "recognit": 2, "medic": 2, "disclaim": 2, "profession": [2, 4], "referr": 2, "mechan": 2, "boundari": 2, "situat": [2, 3], "incorrect": 2, "liabil": 2, "vulner": 2, "standard": 2, "expertis": 2, "util": [2, 3], "bia": [2, 4], "gender": 2, "racial": 2, "cultur": 2, "demograph": 2, "represent": [2, 3], "inclus": [2, 3, 4], "stereotyp": 2, "fair": 2, "reinforc": 2, "societ": 2, "equal": 2, "social": 2, "brand": 2, "privaci": [2, 4], "pii": 2, "anonym": 2, "leakag": 2, "carryov": 2, "regul": [2, 4], "protocol": 2, "confidenti": 2, "breach": 2, "cognit": 2, "multi": [2, 4], "mathemat": 2, "fallaci": 2, "causal": 2, "edg": 2, "think": 2, "mainten": 2, "idiom": 2, "sarcasm": 2, "terminologi": 2, "lingual": 2, "misunderstand": 2, "sophist": [2, 3], "syntax": 2, "scan": 2, "document": [2, 3, 4], "compat": [2, 4], "stabil": 2, "effici": [2, 3, 4], "debt": 2, "scalabl": [2, 3], "failur": 2, "meta": [2, 3], "correct": [2, 4], "feedback": [2, 4], "overconfid": 2, "improv": [2, 3, 4], "clariti": [2, 3, 4], "audienc": 2, "densiti": 2, "transfer": 2, "satisfact": [2, 4], "ethic": 2, "request": [2, 3, 4], "incid": 2, "misus": 2, "moral": 2, "valu": [2, 3, 4], "transpar": [2, 4], "stakehold": 2, "environment": 2, "co2": 2, "emiss": 2, "energi": 2, "consumpt": 2, "per": [2, 3], "server": [2, 4], "locat": 2, "batch": 2, "hardwar": 2, "infer": 2, "sustain": 2, "corpor": 2, "three": 2, "app": 2, "imag": 2, "audio": 2, "etc": [2, 4], "truth": [2, 4], "option": [2, 3, 4], "layer": [2, 3, 4], "repres": [2, 4], "palm": 2, "individu": [2, 3], "target": [2, 4], "further": [2, 3, 4], "see": [2, 4], "avail": [2, 3, 4], "addition": 2, "shown": 2, "fix": [2, 3], "all": [2, 3, 4], "default": [2, 4], "quantifi": 2, "rank": 2, "easi": [2, 3], "addit": [2, 3, 4], "quantit": 2, "among": 2, "aggreg": 2, "plan": [2, 4], "pertain": 2, "previous": [2, 3, 4], "introduc": [2, 3, 4], "doe": [2, 3, 4], "good": [2, 4], "ani": [2, 3, 4], "separ": [2, 3], "synthet": [2, 4], "updat": [2, 3, 4], "reflect": 2, "post": [2, 4], "launch": 2, "timeout": 2, "variat": 2, "maxim": 2, "success": [2, 4], "inter": 2, "rater": 2, "weight": 2, "rel": 2, "priorit": 2, "normal": [2, 4], "absolut": [2, 4], "fail": 2, "confid": [2, 4], "interv": 2, "ti": 2, "veri": 2, "close": 2, "tier": 2, "holist": 2, "built": [2, 4], "mind": 2, "x": 2, "fast": 2, "particularli": [2, 3, 4], "promot": 2, "rapid": 2, "experiment": [2, 4], "iter": [2, 3, 4], "final": [2, 3, 4], "keep": [2, 3], "itself": 2, "confirm": 2, "vi": 2, "later": [2, 4], "chapter": 2, "categor": [2, 4], "broad": [2, 4], "intrins": 2, "extrins": 2, "sequenc": [2, 4], "perplex": 2, "given": [2, 3, 4], "variou": [2, 3, 4], "downstream": [2, 4], "directli": [2, 4], "valuabl": [2, 4], "interest": [2, 3, 4], "sinc": [2, 3, 4], "term": [2, 3], "discrimin": 2, "distinguish": 2, "classifi": 2, "sentiment": [2, 4], "classif": [2, 4], "identifi": [2, 3, 4], "whether": [2, 3, 4], "true": [2, 3, 4], "synthesi": 2, "summar": [2, 3], "log": 2, "discret": 2, "recal": 2, "f1": 2, "match": [2, 4], "exact": 2, "prefix": 2, "translat": 2, "roug": 2, "bleu": 2, "charact": [2, 3, 4], "n": [2, 3], "gram": 2, "characterist": 2, "being": [2, 4], "short": [2, 3, 4], "wide": [2, 3, 4], "definit": [2, 4], "bilingu": 2, "understudi": 2, "overlap": [2, 3], "favor": [2, 4], "due": [2, 3], "breviti": 2, "penalti": 2, "insensit": 2, "semant": [2, 3], "high": [2, 3], "orient": 2, "gist": 2, "word": [2, 3, 4], "sentenc": [2, 3, 4], "ignor": 2, "equival": 2, "influenc": [2, 4], "meteor": 2, "synonym": 2, "stem": [2, 4], "paraphras": 2, "alongsid": 2, "computation": [2, 3], "expens": 2, "cider": 2, "consensu": 2, "descript": [2, 4], "tf": 2, "idf": 2, "caption": 2, "outsid": 2, "reliant": 2, "corpu": 2, "statist": 2, "ter": 2, "edit": 2, "number": [2, 3, 4], "convert": [2, 4], "hypothesi": 2, "penal": 2, "bertscor": 2, "embed": [2, 3], "bert": 2, "spice": 2, "proposit": 2, "scene": 2, "graph": 2, "emphasi": 2, "onli": [2, 3, 4], "pure": 2, "textual": 2, "As": [2, 3, 4], "analyst": [2, 3], "prepar": [2, 3], "dictionari": [2, 4], "rouge_1": 2, "rouge_2": 2, "ideal": [2, 4], "expert": [2, 3, 4], "cheaper": 2, "4o": [2, 3, 4], "mini": [2, 3, 4], "evaluate_summari": 2, "unigram": 2, "bigram": 2, "huggingfac": 2, "librari": [2, 3, 4], "absl": 2, "py": 2, "rouge_scor": 2, "generated_summari": 2, "reference_summari": 2, "arg": [2, 3, 4], "dict": [2, 3, 4], "google_bleu": 2, "bleu_scor": 2, "rouge1": 2, "rouge2": 2, "instanc": [2, 3], "arbitrari": 2, "chosen": 2, "sentence1": 2, "cat": 2, "sat": 2, "mat": 2, "sentence2": 2, "ate": 2, "3333333333333333": 2, "7272727272727272": 2, "4444444444444445": 2, "generate_summari": 2, "summir": 2, "correspond": [2, 4], "liner": 2, "excerpt": 2, "evaluate_summary_model": 2, "model_benchmark": 2, "models_test": 2, "benchmark_summari": 2, "model_summari": 2, "evaluation_result": 2, "line": 2, "name": [2, 3, 4], "zip": 2, "annual": 2, "stock": [2, 4], "govern": 2, "forward": 2, "reveal": 2, "analyz": [2, 3, 4], "statu": 2, "concis": 2, "omit": [2, 4], "essenti": [2, 3, 4], "element": [2, 4], "Its": 2, "adequ": 2, "verbos": 2, "relev": 2, "peripher": 2, "quit": [2, 4], "overli": [2, 4], "simplifi": [2, 4], "miss": 2, "convei": [2, 3], "breadth": 2, "Of": 2, "cours": 2, "abov": 2, "vibe": 2, "visualize_prompt_comparison": 2, "visual": 2, "matplotlib": 2, "radar": 2, "plot": 2, "radar_plot": 2, "show": [2, 3, 4], "tmp": 2, "ipykernel_1652501": 2, "940173201": 2, "userwarn": 2, "figurecanvasagg": 2, "thu": 2, "put": 2, "closest": 2, "largest": 2, "deviat": [2, 4], "suggest": [2, 4], "least": 2, "establish": 2, "otherwis": 2, "qualit": 2, "driven": 2, "mention": [2, 4], "might": [2, 3, 4], "fulli": [2, 3], "nuanc": [2, 3, 4], "especi": [2, 3, 4], "those": [2, 3, 4], "primarili": 2, "granular": [2, 3], "altern": [2, 3], "section": [2, 3, 4], "fall": 2, "judg": 2, "themselv": 2, "act": 2, "paper": [2, 4], "main": [2, 3, 4], "instruct": [2, 3, 4], "fine": [2, 4], "tune": [2, 4], "assign": 2, "likert": 2, "style": 2, "pairwis": 2, "ensembl": 2, "repeatedli": 2, "domain": 2, "procedur": 2, "fluenci": 2, "interpret": 2, "refin": 2, "excel": [2, 4], "narr": 2, "flow": [2, 3], "mirror": 2, "guidelin": 2, "express": [2, 4], "similarli": 2, "notabl": [2, 4], "properli": [2, 4], "henc": 2, "worth": 2, "integ": 2, "rubric": 2, "foster": [2, 4], "hollist": 2, "judgeevalu": 2, "enforc": [2, 4], "four": 2, "grammar": [2, 4], "evaluate_with_llm": 2, "candid": 2, "pars": [2, 4], "criterion": 2, "basemodel": [2, 4], "class": [2, 3, 4], "judge_model": 2, "candidate_summari": 2, "specifi": [2, 3, 4], "wa": [2, 4], "written": 2, "grammat": 2, "y": 2, "z": 2, "w": [2, 3], "beta": [2, 4], "response_format": [2, 4], "Then": 2, "benchmark_model": 2, "test_model": 2, "input_text": [2, 3], "tupl": 2, "iphon": [2, 4], "mac": [2, 4], "ipad": [2, 4], "incorpor": 2, "regard": 2, "obtain": [2, 4], "respect": 2, "regist": 2, "approxim": [2, 4], "6": [2, 3, 4], "trillion": [2, 4], "held": [2, 4], "affili": [2, 4], "billion": 2, "outstand": [2, 4], "octob": [2, 4], "18": [2, 4], "7": [2, 3], "8": [2, 3], "evals_list": 2, "1775618912": 2, "14": [2, 4], "some": [2, 3, 4], "achiev": [2, 4], "variant": 2, "slightli": 2, "indic": [2, 4], "drift": 2, "had": 2, "lowest": 2, "overal": [2, 3, 4], "drop": 2, "substanti": 2, "gradient": 2, "visibl": 2, "mark": 2, "degrad": [2, 4], "firstli": 2, "overhead": 2, "neglect": 2, "exhibit": 2, "prefer": [2, 4], "posit": [2, 3, 4], "egocentr": 2, "tight": 2, "small": [2, 4], "field": [2, 4], "financ": 2, "law": 2, "suitabl": 2, "serv": [2, 3, 4], "aproach": 2, "significantli": [2, 3], "workflow": [2, 4], "assessor": 2, "intens": [2, 4], "aplic": 2, "aim": [2, 3, 4], "clearli": [2, 4], "earlier": 2, "depict": [2, 4], "higher": 2, "correl": 2, "were": [2, 4], "multilingu": 2, "golden": 2, "recruit": 2, "languang": 2, "arena": 2, "vote": 2, "made": [2, 3, 4], "under": [2, 4], "blind": 2, "randomli": 2, "pair": 2, "submit": 2, "actual": [2, 3, 4], "loop": 2, "customiz": 2, "irrelev": 2, "unhelp": 2, "sometim": [2, 4], "though": [2, 4], "occasion": 2, "regularli": 2, "inquiri": 2, "rare": 2, "inaccuraci": 2, "highli": [2, 4], "perfectli": 2, "cater": 2, "polici": 2, "benefit": [2, 4], "critiqu": 2, "elo": 2, "democrat": [2, 4], "simul": [2, 4], "thought": [2, 4], "exam": 2, "probe": 2, "certifi": 2, "identif": 2, "histori": 2, "prioriti": 2, "intellig": 2, "move": [2, 3], "began": 2, "2018": 2, "introduct": [2, 3], "glue": 2, "wang": 2, "2019": 2, "entail": 2, "baselin": 2, "superglu": 2, "expand": 2, "deeper": [2, 3], "successor": 2, "grew": 2, "broader": 2, "big": 2, "bench": 2, "srivastava": 2, "2023": 2, "turn": 2, "200": 2, "span": 2, "arithmet": 2, "collabor": 2, "truthfulqa": 2, "lin": [2, 4], "accur": [2, 4], "decept": 2, "increasingli": [2, 4], "multitask": 2, "hendryck": 2, "2021": 2, "multidisciplinari": 2, "57": 2, "stanford": 2, "helm": 2, "liang": 2, "multidimension": 2, "concern": 2, "surround": [2, 4], "emphas": [2, 4], "humanev": 2, "chen": [2, 4], "lmsy": 2, "brought": 2, "dialogu": 2, "len": [2, 3], "replic": [2, 4], "find": [2, 3, 4], "industri": [2, 4], "chatbot": 2, "chiang": 2, "direct": 2, "live": 2, "gather": 2, "000": [2, 4], "assist": [2, 4], "alpacaev": 2, "duboi": 2, "mt": 2, "zheng": 2, "Their": [2, 4], "newer": 2, "render": 2, "ineffect": 2, "crowdsourc": 2, "own": [2, 3], "livebench": 2, "white": 2, "resili": 2, "competit": 2, "free": 2, "70": 2, "meaningfulli": 2, "monthli": 2, "came": 2, "center": [2, 4], "arc": 2, "prize": 2, "chollet": 2, "mike": 2, "knoop": 2, "co": 2, "founder": 2, "zapier": 2, "fran\u00e7oi": 2, "creator": 2, "agi": 2, "kera": 2, "narrow": 2, "suffici": [2, 4], "meaning": [2, 3, 4], "capac": 2, "genuin": 2, "accord": 2, "econom": 2, "acquir": 2, "skill": 2, "five": 2, "old": 2, "possess": 2, "count": [2, 3], "elementari": 2, "physic": 2, "novelti": 2, "puzzl": 2, "someth": 2, "wouldn": 2, "vast": 2, "interpol": 2, "memori": [2, 3], "synthes": 2, "fly": 2, "brute": 2, "possibl": [2, 4], "million": 2, "seri": 2, "minim": [2, 4], "prior": 2, "submiss": 2, "pixel": 2, "perfect": 2, "color": 2, "unbeaten": 2, "decemb": 2, "minimum": 2, "85": 2, "win": 2, "deep": 2, "poorli": 2, "relat": 2, "lack": [2, 4], "recombin": 2, "spur": 2, "state": [2, 3, 4], "art": 2, "33": 2, "55": 2, "takeawai": 2, "algorithm": 2, "exceed": 2, "modern": [2, 3, 4], "fourrier": 2, "lightweight": [2, 4], "varieti": 2, "bespok": 2, "via": [2, 4], "sdk": 2, "cli": 2, "been": 2, "extract": [2, 3, 4], "autoregress": 2, "conduct": 2, "sub": 2, "liter": 2, "disturb": 2, "zero": 2, "varianc": 2, "yt": 2, "ut": 2, "uncondit": 2, "suppos": 2, "p": 2, "08": [2, 4], "exactli": [2, 4], "ii": [2, 4], "iv": 2, "iii": 2, "c": [2, 4], "consequ": 2, "ol": 2, "heteroscedast": 2, "regress": 2, "ineffici": 2, "wish": 2, "lag": 2, "var": 2, "bivari": 2, "acceler": 2, "evaluation_track": 2, "evaluationtrack": 2, "model_config": 2, "basemodelconfig": 2, "parallelismmanag": 2, "pipelineparamet": 2, "envconfig": 2, "is_accelerate_avail": 2, "datetim": 2, "timedelta": 2, "initprocessgroupkwarg": 2, "create_evaluation_pipelin": 2, "output_dir": 2, "cache_dir": 2, "pretrain": 2, "dtype": 2, "float16": 2, "max_sampl": 2, "kwargs_handl": 2, "second": [2, 3], "3000": 2, "els": [2, 3], "none": 2, "save_detail": 2, "push_to_hub": 2, "pipeline_param": 2, "launcher_typ": 2, "env_config": 2, "override_batch_s": 2, "use_chat_templ": 2, "trust_remote_cod": 2, "pipeline_paramet": 2, "schemat": [2, 3], "vllm": [2, 4], "tgi": 2, "instanti": 2, "storag": 2, "local": [2, 3, 4], "track": 2, "push": 2, "hub": 2, "parallel": 2, "temporari": 2, "maximum": [2, 3], "num_few_shot": 2, "automat": 2, "string": [2, 4], "vertic": 2, "bar": 2, "few": [2, 3, 4], "binari": 2, "flag": 2, "bigbench": 2, "winogrand": 2, "hellaswag": 2, "nlp": 2, "choos": 2, "1b": 2, "save": [2, 3], "save_and_push_result": 2, "show_result": 2, "model_arg": 2, "download": 2, "remot": 2, "send": [2, 4], "serverless": 2, "dedic": [2, 4], "id": 2, "inference_server_address": 2, "inference_server_auth": 2, "model_id": 2, "null": 2, "bash": 2, "command": 2, "model_config_path": 2, "path": [2, 3], "endpoint_model": 2, "yaml": [2, 4], "llama3": [2, 3], "qwen2": [2, 4], "smollm2": 2, "describ": 2, "3b": 2, "alibaba": [2, 4], "5b": [2, 4], "hui": 2, "yang": 2, "compact": 2, "360m": 2, "allal": 2, "9": 2, "trend": [2, 4], "cluster": 2, "degre": 2, "noteworthi": 2, "superior": 2, "taken": [2, 4], "grain": [2, 4], "salt": [2, 4], "100": [2, 4], "give": 2, "trade": [2, 4], "off": [2, 3, 4], "flexibl": [2, 3, 4], "exponenti": 2, "growth": 2, "hug": [2, 4], "ecosystem": 2, "modular": 2, "visit": 2, "offici": 2, "alb": 2, "24": [2, 4], "loubna": 2, "ben": 2, "anton": 2, "lozhkov": 2, "eli": 2, "bakouch": 2, "gabriel": 2, "mart\u00edn": 2, "bl\u00e1zquez": 2, "lewi": 2, "tunstal": 2, "agust\u00edn": 2, "piquer": 2, "andr": 2, "marafioti": 2, "cyril": 2, "zakka": 2, "leandro": 2, "von": 2, "werra": 2, "thoma": 2, "wolf": 2, "are24": 2, "judgearena": 2, "ctj": 2, "21": 2, "jerri": 2, "tworek": 2, "heewoo": 2, "jun": 2, "qime": 2, "yuan": 2, "henriqu": 2, "pond": 2, "de": 2, "oliveira": 2, "pinto": 2, "jare": 2, "kaplan": 2, "harri": 2, "edward": 2, "yuri": 2, "burda": 2, "nichola": 2, "joseph": 2, "greg": 2, "brockman": 2, "alex": 2, "rai": 2, "raul": 2, "puri": 2, "gretchen": 2, "krueger": 2, "michael": [2, 4], "petrov": 2, "heidi": 2, "khlaaf": 2, "girish": 2, "sastri": 2, "pamela": 2, "mishkin": 2, "brook": 2, "chan": 2, "scott": 2, "grai": 2, "nick": 2, "ryder": 2, "mikhail": 2, "pavlov": 2, "alethea": 2, "lukasz": 2, "kaiser": 2, "mohammad": 2, "bavarian": 2, "clemen": 2, "winter": 2, "philipp": 2, "tillet": 2, "felip": 2, "petroski": 2, "Such": 2, "dave": 2, "cum": 2, "matthia": 2, "plappert": 2, "fotio": 2, "chantzi": 2, "elizabeth": 2, "barn": 2, "ariel": 2, "herbert": 2, "voss": 2, "william": 2, "hebgen": 2, "guss": 2, "nichol": 2, "paino": 2, "nikola": 2, "tezak": 2, "jie": 2, "tang": 2, "igor": 2, "babuschkin": 2, "suchir": 2, "balaji": 2, "shantanu": 2, "jain": 2, "saunder": 2, "christoph": 2, "hess": 2, "andrew": 2, "carr": 2, "jan": 2, "leik": 2, "josh": 2, "achiam": 2, "vedant": 2, "misra": 2, "evan": 2, "morikawa": 2, "alec": 2, "radford": 2, "matthew": 2, "knight": 2, "mile": 2, "brundag": 2, "mira": 2, "murati": 2, "kati": 2, "mayer": 2, "peter": 2, "welind": 2, "bob": [2, 4], "mcgrew": 2, "dario": 2, "amodei": 2, "sam": 2, "mccandlish": 2, "ilya": 2, "sutskev": 2, "wojciech": 2, "zaremba": 2, "url": [2, 4], "arxiv": [2, 4], "org": [2, 4], "ab": [2, 4], "2107": 2, "03374": 2, "cz": 2, "lianmin": 2, "ying": 2, "sheng": 2, "anastasio": 2, "angelopoulo": 2, "tianl": 2, "dacheng": 2, "hao": 2, "zhang": 2, "banghua": 2, "zhu": 2, "jordan": 2, "gonzalez": 2, "ion": 2, "stoica": 2, "2403": 2, "04132": 2, "cho24a": 2, "francoi": 2, "websit": 2, "arcpriz": 2, "cho24b": 2, "dglh24": 2, "yann": 2, "bal\u00e1z": 2, "galambosi": 2, "perci": 2, "tatsunori": 2, "hashimoto": 2, "debia": 2, "2404": 2, "04475": 2, "fac24a": 2, "wiki": [2, 4], "fac24b": 2, "fac24c": 2, "doc": [2, 3, 4], "model_doc": 2, "gpt2": 2, "fac24d": 2, "cookbook": 2, "en": [2, 4], "llm_judg": 2, "fac24": 2, "fac24f": 2, "space": 2, "blog": 2, "fhwt23": 2, "cl\u00e9mentin": 2, "nathan": 2, "habib": 2, "hbb": 2, "dan": 2, "collin": 2, "burn": 2, "steven": 2, "basart": 2, "andi": 2, "zou": 2, "manta": 2, "mazeika": 2, "dawn": 2, "song": 2, "jacob": 2, "steinhardt": 2, "2009": 2, "03300": 2, "hbd": 2, "20": [2, 4], "ari": 2, "bui": 2, "du": 2, "maxwel": 2, "forb": 2, "yejin": 2, "choi": 2, "curiou": 2, "neural": [2, 4], "degener": 2, "1904": 2, "09751": 2, "hyc": 2, "binyuan": 2, "jian": 2, "zeyu": 2, "cui": 2, "jiaxi": 2, "dayiheng": 2, "liu": [2, 4], "lei": 2, "tianyu": 2, "jiajun": 2, "bowen": 2, "yu": 2, "kai": 2, "dang": 2, "coder": 2, "preprint": [2, 4], "2409": 2, "12186": 2, "lx": 2, "zhen": 2, "xiaohan": 2, "xu": 2, "tao": 2, "shen": 2, "jia": 2, "gu": 2, "yuxuan": 2, "lai": 2, "chongyang": 2, "shuai": 2, "ma": 2, "nlg": 2, "2401": 2, "07103": 2, "lbl": 2, "23": 2, "rishi": 2, "bommasani": 2, "toni": 2, "lee": [2, 4], "dimitri": 2, "tsipra": 2, "dilara": 2, "soylu": 2, "michihiro": 2, "yasunaga": 2, "yian": 2, "deepak": 2, "narayanan": 2, "yuhuai": 2, "wu": [2, 4], "ananya": 2, "kumar": 2, "benjamin": 2, "newman": 2, "binhang": 2, "bobbi": 2, "yan": 2, "ce": 2, "christian": 2, "cosgrov": 2, "man": 2, "r\u00e9": 2, "diana": 2, "acosta": 2, "nava": 2, "drew": 2, "hudson": 2, "eric": 2, "zelikman": 2, "esin": 2, "durmu": 2, "faisal": 2, "ladhak": 2, "frieda": 2, "rong": 2, "hongyu": 2, "ren": 2, "huaxiu": 2, "yao": 2, "jue": 2, "keshav": 2, "santhanam": 2, "laurel": 2, "orr": 2, "lucia": 2, "mert": 2, "yuksekgonul": 2, "mirac": 2, "suzgun": 2, "kim": 2, "neel": 2, "guha": 2, "niladri": 2, "chatterji": 2, "omar": 2, "khattab": 2, "henderson": 2, "qian": 2, "huang": 2, "ryan": 2, "chi": [2, 4], "sang": 2, "xie": 2, "shibani": 2, "santurkar": 2, "surya": 2, "ganguli": 2, "icard": 2, "tianyi": 2, "vishrav": 2, "chaudhari": 2, "xuechen": 2, "yifan": 2, "yuhui": 2, "yuta": 2, "koreeda": 2, "2211": 2, "09110": 2, "lhe22": 2, "stephani": 2, "hilton": 2, "owain": 2, "mimic": 2, "falsehood": 2, "2109": 2, "07958": 2, "srr": 2, "aarohi": 2, "abhinav": 2, "rastogi": 2, "abhishek": 2, "rao": 2, "abu": 2, "awal": 2, "md": [2, 4], "shoeb": 2, "abubakar": 2, "abid": 2, "adam": 2, "fisch": 2, "brown": 2, "santoro": 2, "aditya": 2, "gupta": 2, "adri\u00e0": 2, "garriga": 2, "alonso": 2, "agnieszka": 2, "kluska": 2, "aitor": 2, "lewkowycz": 2, "akshat": 2, "agarw": 2, "warstadt": 2, "alexand": [2, 4], "kocurek": 2, "ali": 2, "safaya": 2, "tazarv": 2, "alic": [2, 4], "xiang": 2, "alicia": 2, "parrish": 2, "allen": 2, "nie": 2, "aman": 2, "hussain": 2, "amanda": 2, "askel": 2, "dsouza": 2, "ambros": 2, "slone": 2, "ameet": 2, "rahan": 2, "anantharaman": 2, "iyer": 2, "ander": 2, "andreassen": 2, "andrea": 2, "madotto": 2, "santilli": 2, "stuhlm\u00fcller": 2, "dai": [2, 4], "la": 2, "lampinen": 2, "angela": 2, "jiang": 2, "angelica": 2, "anh": 2, "vuong": 2, "animesh": 2, "anna": 2, "gottardi": 2, "antonio": 2, "norelli": 2, "anu": 2, "venkatesh": 2, "arash": 2, "gholamidavoodi": 2, "arfa": 2, "tabassum": 2, "arul": 2, "menez": 2, "arun": 2, "kirubarajan": 2, "asher": 2, "mullokandov": 2, "ashish": 2, "sabharw": 2, "austin": 2, "herrick": 2, "avia": 2, "efrat": 2, "aykut": 2, "erdem": 2, "ayla": 2, "karaka\u015f": 2, "robert": 2, "bao": 2, "loe": 2, "barret": 2, "zoph": 2, "bart\u0142omiej": 2, "bojanowski": 2, "batuhan": 2, "\u00f6zyurt": 2, "behnam": 2, "hedayatnia": 2, "neyshabur": 2, "inden": 2, "benno": 2, "stein": 2, "berk": 2, "ekmekci": 2, "bill": 2, "yuchen": 2, "blake": 2, "howald": 2, "bryan": 2, "orinion": 2, "cameron": [2, 4], "diao": 2, "dour": 2, "catherin": 2, "stinson": 2, "cedrick": 2, "argueta": 2, "c\u00e9sar": 2, "ferri": 2, "ram\u00edrez": 2, "chandan": 2, "singh": 2, "charl": 2, "rathkopf": 2, "chenlin": 2, "meng": 2, "chitta": 2, "baral": 2, "chiyu": 2, "chri": 2, "callison": 2, "burch": 2, "wait": 2, "voigt": 2, "pott": 2, "cindi": 2, "ramirez": 2, "clara": 2, "rivera": 2, "clemencia": 2, "siro": 2, "colin": 2, "raffel": 2, "courtnei": 2, "ashcraft": 2, "cristina": 2, "garbacea": 2, "damien": 2, "sileo": 2, "garrett": 2, "kilman": 2, "roth": 2, "daniel": 2, "freeman": 2, "khashabi": 2, "levi": 2, "mosegu\u00ed": 2, "gonz\u00e1lez": 2, "perszyk": 2, "danni": 2, "hernandez": 2, "danqi": 2, "daphn": 2, "ippolito": 2, "dar": 2, "gilboa": 2, "david": 2, "dohan": 2, "drakard": 2, "jurgen": 2, "debajyoti": 2, "datta": 2, "deni": 2, "emelin": 2, "kleyko": 2, "deniz": 2, "yuret": 2, "derek": 2, "tam": [2, 4], "dieuwk": 2, "hupk": 2, "diganta": 2, "dilyar": 2, "buzan": 2, "coelho": 2, "mollo": 2, "diyi": 2, "dong": 2, "ho": 2, "dylan": 2, "schrader": 2, "ekaterina": 2, "shutova": 2, "ekin": 2, "dogu": 2, "cubuk": 2, "elad": 2, "segal": 2, "eleanor": 2, "hagerman": 2, "donowai": 2, "elli": 2, "pavlick": 2, "emanuel": 2, "rodola": 2, "emma": 2, "lam": 2, "chu": 2, "erkut": 2, "erni": 2, "ethan": 2, "dyer": 2, "jerzak": 2, "eunic": 2, "engefu": 2, "manyasi": 2, "evgenii": 2, "zheltonozhskii": 2, "fanyu": 2, "xia": 2, "fatemeh": 2, "siar": 2, "fernando": 2, "mart\u00ednez": 2, "plume": 2, "francesca": 2, "happ\u00e9": 2, "gaurav": 2, "mishra": 2, "genta": 2, "indra": 2, "winata": 2, "gerard": 2, "melo": 2, "germ\u00e1n": 2, "kruszewski": 2, "giambattista": 2, "parascandolo": 2, "giorgio": 2, "mariani": 2, "gloria": 2, "gonzalo": 2, "jaimovitch": 2, "l\u00f3pez": 2, "gregor": 2, "betz": 2, "gui": 2, "gur": 2, "hana": 2, "galijasev": 2, "hannah": 2, "rashkin": 2, "hannaneh": 2, "hajishirzi": 2, "harsh": 2, "mehta": 2, "hayden": 2, "bogar": 2, "henri": 2, "shevlin": 2, "hinrich": 2, "sch\u00fctze": 2, "hiromu": 2, "yakura": 2, "hongm": 2, "hugh": 2, "mee": 2, "wong": 2, "ian": 2, "ng": 2, "isaac": 2, "nobl": 2, "jaap": 2, "jumelet": 2, "jack": 2, "geissing": 2, "jackson": 2, "kernion": 2, "jaehoon": 2, "jaim": 2, "fern\u00e1ndez": 2, "fisac": 2, "jame": 2, "simon": 2, "koppel": 2, "koco\u0144": 2, "jana": 2, "thompson": 2, "janel": 2, "wingfield": 2, "jarema": 2, "radom": 2, "jascha": 2, "sohl": 2, "dickstein": 2, "jason": 2, "phang": 2, "yosinski": 2, "jekaterina": 2, "novikova": 2, "jell": 2, "bosscher": 2, "jennif": 2, "marsh": 2, "jeremi": 2, "jeroen": 2, "taal": 2, "jess": 2, "engel": 2, "jesujoba": 2, "alabi": 2, "jiacheng": 2, "jiam": 2, "jillian": 2, "joan": 2, "waweru": 2, "john": 2, "burden": 2, "miller": 2, "bali": 2, "jonathan": 2, "batcheld": 2, "berant": 2, "j\u00f6rg": 2, "frohberg": 2, "jo": 2, "rozen": 2, "jose": 2, "orallo": 2, "boudeman": 2, "guerr": 2, "jone": 2, "joshua": 2, "tenenbaum": 2, "rule": [2, 3, 4], "joyc": 2, "chua": 2, "kamil": 2, "kanclerz": 2, "karen": 2, "livescu": 2, "karl": 2, "krauth": 2, "karthik": 2, "gopalakrishnan": 2, "katerina": 2, "ignatyeva": 2, "katja": 2, "markert": 2, "kaustubh": 2, "dhole": 2, "kevin": 2, "gimpel": 2, "omondi": 2, "kori": 2, "mathewson": 2, "kristen": 2, "chiafullo": 2, "ksenia": 2, "shkaruta": 2, "shridhar": 2, "kyle": 2, "mcdonel": 2, "richardson": 2, "laria": 2, "reynold": 2, "leo": 2, "gao": 2, "liam": 2, "dugan": 2, "lianhui": 2, "qin": 2, "lidia": 2, "contrera": 2, "ochando": 2, "loui": 2, "morenc": 2, "luca": [2, 4], "moschella": 2, "luci": 2, "ludwig": 2, "schmidt": 2, "luheng": 2, "lui": 2, "olivero": 2, "col\u00f3n": 2, "luke": 2, "metz": 2, "l\u00fctfi": 2, "kerem": 2, "\u015fenel": 2, "maarten": 2, "bosma": 2, "sap": 2, "maartj": 2, "hoev": 2, "maheen": 2, "farooqi": 2, "manaal": 2, "faruqui": 2, "marco": 2, "baturan": 2, "marelli": 2, "maru": 2, "maria": 2, "quintana": 2, "mari": 2, "tolkiehn": 2, "mario": 2, "giulianelli": 2, "martha": 2, "martin": 2, "potthast": 2, "l": 2, "leavitt": 2, "hagen": 2, "m\u00e1ty\u00e1": 2, "schubert": 2, "medina": 2, "orduna": 2, "baitemirova": 2, "melodi": 2, "arnaud": 2, "melvin": 2, "mcelrath": 2, "yee": 2, "cohen": 2, "ivanitskii": 2, "starritt": 2, "strube": 2, "micha\u0142": 2, "sw\u0119drowski": 2, "michel": 2, "bevilacqua": 2, "mihir": 2, "kale": 2, "cain": 2, "mime": 2, "mitch": 2, "walker": 2, "mo": 2, "tiwari": 2, "mohit": 2, "bansal": 2, "moin": 2, "aminnaseri": 2, "mor": 2, "geva": 2, "mozhdeh": 2, "gheini": 2, "mukund": 2, "varma": 2, "nanyun": 2, "peng": 2, "nayeon": 2, "neta": 2, "krakov": 2, "doiron": 2, "nicol": 2, "martinez": 2, "nikita": 2, "nangia": 2, "nikla": 2, "decker": 2, "muennighoff": 2, "nitish": 2, "shirish": 2, "keskar": 2, "niveditha": 2, "noah": 2, "constant": 2, "fiedel": 2, "nuan": 2, "wen": 2, "oliv": 2, "agha": 2, "elbaghdadi": 2, "omer": 2, "moreno": 2, "casar": 2, "parth": 2, "doshi": 2, "pascal": 2, "fung": 2, "paul": 2, "pu": 2, "vicol": 2, "pegah": 2, "alipoormolabashi": 2, "peiyuan": 2, "liao": 2, "eckerslei": 2, "phu": 2, "mon": 2, "htut": 2, "pinyu": 2, "hwang": 2, "piotr": 2, "mi\u0142kowski": 2, "piyush": 2, "patil": 2, "pouya": 2, "pezeshkpour": 2, "priti": 2, "oli": 2, "qiaozhu": 2, "mei": 2, "qing": 2, "lyu": 2, "qinlang": 2, "rabin": 2, "banjad": 2, "rachel": 2, "etta": 2, "rudolph": 2, "raefer": 2, "rahel": 2, "haback": 2, "ramon": 2, "risco": 2, "rapha\u00ebl": 2, "milli\u00e8r": 2, "rhythm": 2, "garg": 2, "rif": 2, "saurou": 2, "riku": 2, "arakawa": 2, "robb": 2, "raymaek": 2, "frank": 2, "rohan": 2, "sikand": 2, "roman": 2, "novak": 2, "sitelew": 2, "ronan": 2, "lebra": 2, "rosann": 2, "rowan": 2, "rui": [2, 4], "ruslan": 2, "salakhutdinov": 2, "stoval": 2, "teehan": 2, "rylan": 2, "sahib": 2, "saif": 2, "sajant": 2, "anand": 2, "dillav": 2, "shleifer": 2, "wiseman": 2, "samuel": 2, "gruetter": 2, "bowman": 2, "schoenholz": 2, "sanghyun": 2, "han": 2, "sanjeev": 2, "kwatra": 2, "sarah": 2, "rou": 2, "sarik": 2, "ghazarian": 2, "sayan": 2, "ghosh": 2, "sean": 2, "casei": 2, "sebastian": 2, "bischoff": 2, "gehrmann": 2, "schuster": 2, "sepideh": 2, "sadeghi": 2, "shadi": 2, "hamdan": 2, "sharon": 2, "zhou": 2, "shashank": 2, "sherri": 2, "shi": 2, "shikhar": 2, "shima": 2, "asaadi": 2, "shixiang": 2, "shane": 2, "shubh": 2, "pachchigar": 2, "shubham": 2, "toshniw": 2, "shyam": 2, "upadhyai": 2, "shyamolima": 2, "debnath": 2, "siamak": 2, "shakeri": 2, "thormey": 2, "melzi": 2, "siva": 2, "reddi": 2, "sneha": 2, "priscilla": 2, "makini": 2, "soo": 2, "hwan": 2, "spencer": 2, "toren": 2, "sriharsha": 2, "hatwar": 2, "stanisla": 2, "dehaen": 2, "stefan": 2, "divic": 2, "stefano": 2, "ermon": 2, "stella": 2, "biderman": 2, "stephen": 2, "prasad": 2, "piantadosi": 2, "stuart": 2, "shieber": 2, "summer": 2, "misherghi": 2, "svetlana": 2, "kiritchenko": 2, "swaroop": 2, "tal": 2, "linzen": 2, "tariq": 2, "tatsu": 2, "te": 2, "th\u00e9o": 2, "desbord": 2, "theodor": 2, "rothschild": 2, "phan": 2, "tiberiu": 2, "nkinyili": 2, "timo": 2, "schick": 2, "timofei": 2, "kornev": 2, "titu": 2, "tunduni": 2, "tobia": 2, "gerstenberg": 2, "trenton": 2, "trishala": 2, "neeraj": 2, "tushar": 2, "khot": 2, "tyler": 2, "shultz": 2, "uri": 2, "shaham": 2, "vera": 2, "demberg": 2, "victoria": 2, "nyamai": 2, "vika": 2, "raunak": 2, "vinai": 2, "ramasesh": 2, "udai": 2, "prabhu": 2, "vishakh": 2, "padmakumar": 2, "vivek": 2, "srikumar": 2, "fedu": 2, "wout": 2, "vossen": 2, "xiaoyu": 2, "tong": 2, "xinran": 2, "zhao": 2, "xinyi": 2, "xudong": 2, "yadollah": 2, "yaghoobzadeh": 2, "yair": 2, "lakretz": 2, "yangqiu": 2, "yasaman": 2, "bahri": 2, "yichi": 2, "yide": 2, "yifu": 2, "yonatan": 2, "belinkov": 2, "hou": 2, "yufang": 2, "yuntao": 2, "bai": 2, "zachari": 2, "seid": 2, "zhuoy": 2, "zijian": 2, "ziji": 2, "j": [2, 4], "zirui": 2, "ziyi": 2, "imit": 2, "game": 2, "extrapol": 2, "2206": 2, "04615": 2, "wpn": 2, "19": 2, "yada": 2, "pruksachatkun": 2, "amanpreet": 2, "julian": 2, "felix": 2, "hill": 2, "stickier": 2, "wsm": 2, "1804": 2, "07461": 2, "wtb": 2, "22": 2, "yi": [2, 4], "tai": 2, "borgeaud": 2, "dani": 2, "yogatama": 2, "denni": 2, "donald": 2, "metzler": 2, "ed": 2, "h": 2, "oriol": 2, "vinyal": 2, "jeff": 2, "dean": 2, "07682": 2, "wdr": 2, "doolei": 2, "manlei": 2, "arka": 2, "pal": 2, "feuer": 2, "siddhartha": 2, "ravid": 2, "shwartz": 2, "ziv": 2, "khalid": 2, "saifullah": 2, "siddartha": 2, "naidu": 2, "chinmai": 2, "hegd": 2, "lecun": 2, "tom": 2, "goldstein": 2, "willi": 2, "neiswang": 2, "micah": 2, "goldblum": 2, "2406": 2, "19314": 2, "yyh": 2, "baosong": 2, "bo": 2, "chengpeng": 2, "chengyuan": 2, "fei": 2, "guant": 2, "haoran": 2, "huan": 2, "jialong": 2, "jialin": 2, "jianhong": 2, "tu": 2, "jianwei": 2, "jianxin": 2, "jin": 2, "jingren": 2, "jinz": 2, "jinzheng": 2, "junyang": 2, "keme": 2, "lu": 2, "keqin": 2, "kexin": 2, "mingfeng": 2, "xue": 2, "na": 2, "ni": 2, "pei": 2, "ru": 2, "men": 2, "ruiz": 2, "runji": 2, "shiji": 2, "sinan": 2, "tan": 2, "tianhang": 2, "tianhao": 2, "wenbin": 2, "ge": 2, "xiaodong": 2, "deng": 2, "xiaohuan": 2, "xingzhang": 2, "xinyu": 2, "xipin": 2, "xuancheng": 2, "fan": 2, "yichang": 2, "wan": 2, "yunfei": 2, "yuqiong": 2, "zhenru": 2, "zhihao": 2, "2407": 2, "10671": 2, "zc": 2, "siyuan": 2, "zhuang": 2, "zhanghao": 2, "yonghao": 2, "zi": 2, "zhuohan": 2, "xing": 2, "2306": 2, "05685": 2, "huggingface24": 2, "12": [2, 3], "06": [2, 4], "metaai24": 2, "promptfoo24": 2, "toolkit": 2, "www": 2, "dev": 2, "go": [3, 4], "far": 3, "possibli": 3, "eliot": 3, "english": 3, "thumb": 3, "\u00be": 3, "max_output_token": 3, "4096": 3, "16384": 3, "contrari": 3, "surpass": 3, "stop": 3, "mid": 3, "truncat": 3, "max_input_token": 3, "input_cost_per_token": 3, "output_cost_per_token": 3, "11b": 3, "v1": 3, "128000": 3, "5e": 3, "sonnet": 3, "20241022": 3, "8192": 3, "200000": 3, "3e": 3, "0613": 3, "6e": 3, "04": 3, "09": 3, "1e": 3, "gemini": 3, "flash": 3, "002": 3, "1048576": 3, "pro": 3, "2097152": 3, "05e": 3, "pose": [3, 4], "incomplet": 3, "extens": [3, 4], "articl": 3, "abruptli": 3, "cut": 3, "disrupt": 3, "shallow": 3, "thorough": 3, "receiv": 3, "partial": 3, "dissatisfact": 3, "frustrat": 3, "educ": 3, "creation": 3, "feasibl": 3, "split": 3, "previou": [3, 4], "10k": 3, "diagram": 3, "charactertextsplitt": 3, "tiktoken": 3, "sequenti": 3, "chain": 3, "newlin": 3, "broadli": [3, 4], "decid": 3, "want": 3, "sure": [3, 4], "lost": 3, "cheap": 3, "speciali": 3, "advantag": [3, 4], "naiv": 3, "period": 3, "nltk": 3, "spaci": 3, "recurs": 3, "divid": 3, "hierarch": 3, "manner": [3, 4], "talk": 3, "theme": 3, "topic": [3, 4], "splitter": 3, "markdown": 3, "html": [3, 4], "get_chunk": 3, "chunk_siz": 3, "chunk_overlap": 3, "langchain_text_splitt": 3, "text_splitt": 3, "from_tiktoken_encod": 3, "split_text": 3, "persona": 3, "assum": 3, "task": [3, 4], "action": 3, "langchain_cor": [3, 4], "prompttempl": 3, "get_base_prompt_templ": 3, "base_prompt": [3, 4], "from_templ": 3, "llmchain": 3, "togeth": 3, "parser": [3, 4], "output_pars": 3, "stroutputpars": 3, "langchain_commun": 3, "chat_model": 3, "chatlitellm": 3, "get_llm_chain": 3, "prompt_templ": [3, 4], "llm_chain": [3, 4], "api_key_label": 3, "upper": 3, "_api_kei": 3, "api_kei": 3, "get_dynamic_prompt_templ": 3, "get_dynamic_prompt_param": 3, "prompt_param": 3, "part_idx": 3, "total_part": 3, "chat_context": 3, "origin": [3, 4], "part": [3, 4], "total": [3, 4], "param": 3, "dynamic_prompt_param": 3, "copi": 3, "elif": 3, "last": [3, 4], "merg": 3, "concaten": 3, "generate_report": 3, "input_cont": 3, "llm_model_nam": 3, "report_part": 3, "num_part": 3, "dinam": 3, "priovid": 3, "enumer": 3, "invok": [3, 4], "cummul": 3, "join": 3, "max_chunk_s": 3, "max_chunk_overlap": 3, "latest": [3, 4], "readabl": 3, "apple_report": 3, "300": 3, "disclos": [3, 4], "luation": 3, "oblig": 3, "cash": 3, "disciplin": 3, "smooth": 3, "upon": 3, "subhead": 3, "adher": [3, 4], "revenu": [3, 4], "segment": [3, 4], "liquid": 3, "capit": [3, 4], "despit": [3, 4], "depth": 3, "overlook": 3, "mitig": [3, 4], "fit": [3, 4], "within": [3, 4], "preserv": 3, "easier": [3, 4], "preprocess": 3, "enhanc": [3, 4], "necessit": 3, "meticul": 3, "retain": 3, "necessari": 3, "seamlessli": 3, "circumv": 3, "therebi": 3, "escal": 3, "frequenc": 3, "volum": 3, "bottleneck": 3, "latenc": 3, "friendli": 3, "mustafa": 3, "suleyman": 3, "infinit": 3, "amount": [3, 4], "fewer": 3, "compress": 3, "progress": 3, "condens": 3, "adjust": [3, 4], "constrain": [3, 4], "collect": 3, "versatil": 3, "drive": [3, 4], "grace": 3, "fallback": 3, "empow": 3, "crucial": [3, 4], "stai": 3, "full": [3, 4], "langchain24": 3, "how_to": 3, "07": [3, 4], "freedom": 4, "thrive": 4, "julia": 4, "easili": 4, "notebook": 4, "overrid": 4, "response_cont": 4, "wow": 4, "lot": 4, "breakdown": 4, "stream": 4, "portfolio": 4, "impress": 4, "notic": 4, "march": 4, "29": 4, "huge": 4, "investor": 4, "figur": 4, "compli": 4, "ye": 4, "date": 4, "serious": 4, "is_json": 4, "myjson": 4, "except": 4, "valueerror": 4, "googl": 4, "survei": 4, "51": 4, "trial": 4, "elicit": 4, "consum": 4, "wrangl": 4, "conform": 4, "ad": 4, "hoc": 4, "streamlin": 4, "subsequ": 4, "modul": 4, "dataset": 4, "unwant": 4, "neg": 4, "ui": 4, "restrict": 4, "mobil": 4, "devic": 4, "overflow": 4, "overwhelm": 4, "twitter": 4, "youtub": 4, "impos": 4, "publish": 4, "successfulli": 4, "adopt": 4, "emploi": 4, "schema": 4, "blueprint": 4, "nativ": 4, "regular": 4, "json_format": 4, "person1": 4, "q1": 4, "person2": 4, "net": 4, "margin": 4, "materi": 4, "nest": 4, "todai": 4, "programmat": 4, "thellm": 4, "unend": 4, "whitespac": 4, "until": 4, "forget": 4, "throw": 4, "appear": 4, "somewher": 4, "json_object": 4, "628": 4, "553": 4, "sheer": 4, "115": 4, "823": 4, "circul": 4, "plai": 4, "vertex": 4, "releas": 4, "suppli": 4, "so": 4, "worri": 4, "enum": 4, "No": 4, "incorrectli": 4, "refus": 4, "simpler": 4, "strongli": 4, "entiti": 4, "place": 4, "secextract": 4, "mentioned_ent": 4, "mentioned_plac": 4, "extract_from_sec_fil": 4, "sec_filing_text": 4, "hint": 4, "attribut": 4, "prompt_extract": 4, "sec_extract": 4, "nasdaq": 4, "llc": 4, "washington": 4, "cupertino": 4, "usabl": 4, "beg": 4, "with_structured_output": 4, "runnabl": 4, "typeddict": 4, "qu": 4, "langchain_openai": 4, "chatopenai": 4, "chatprompttempl": 4, "extract_from_sec_filing_langchain": 4, "structured_llm": 4, "from_messag": 4, "sec_extraction_langchain": 4, "found": 4, "hood": 4, "logit": 4, "raw": 4, "network": 4, "regex": 4, "strong": 4, "enough": 4, "qwen": 4, "label": 4, "unexpect": 4, "malform": 4, "pass": 4, "sec_extraction_outlin": 4, "zsp": 4, "zicorp": 4, "phenomenon": 4, "popular": 4, "cpp": 4, "gbnf": 4, "ggml": 4, "bnf": 4, "ggerganov": 4, "accomplish": 4, "formal": 4, "backu": 4, "naur": 4, "wikipedia": 4, "contributor": 4, "strictli": 4, "soon": 4, "curl": 4, "fssl": 4, "sh": 4, "did": 4, "extract_entities_from_sec_fil": 4, "suffix": 4, "ollama_structured_output_prompt_suffix": 4, "ollama_structured_output_temperatur": 4, "mistral": 4, "llama2": 4, "uncensor": 4, "model_json_schema": 4, "response_json": 4, "AND": 4, "wrapper": 4, "exllama2": 4, "mlx": 4, "lm": 4, "enterpris": 4, "commerci": 4, "medium": 4, "low": 4, "done": 4, "know": 4, "chanc": 4, "connect": 4, "encourag": 4, "correctli": 4, "area": 4, "mix": 4, "famili": 4, "furthermor": 4, "nonetheless": 4, "evid": 4, "studi": 4, "wrap": 4, "map": 4, "gemma": 4, "uncov": 4, "wors": 4, "extran": 4, "dispar": 4, "preval": 4, "outdat": 4, "rapidli": 4, "fashion": 4, "remark": 4, "me": 4, "speak": 4, "freeli": 4, "aider": 4, "decod": 4, "hinder": 4, "outweigh": 4, "team": 4, "rebutt": 4, "argu": 4, "v": 4, "compel": 4, "reproduct": 4, "paint": 4, "pictur": 4, "publicli": 4, "independ": 4, "verif": 4, "dottxt": 4, "flaw": 4, "believ": 4, "led": 4, "inaccur": 4, "reconcil": 4, "uneven": 4, "didn": 4, "conflat": 4, "argument": 4, "drawback": 4, "unlock": 4, "wider": 4, "thank": 4, "pfiffer": 4, "hi": 4, "aid24": 4, "dot24": 4, "sai": 4, "demo": 4, "tree": 4, "gge24": 4, "blob": 4, "readm": 4, "llf": 4, "xieyang": 4, "frederick": 4, "fiannaca": 4, "terri": 4, "koo": 4, "dixon": 4, "carri": 4, "cai": 4, "ea": 4, "york": 4, "ny": 4, "usa": 4, "machineri": 4, "doi": 4, "1145": 4, "3613905": 4, "3650756": 4, "ln": 4, "xuan": 4, "hai": 4, "nguyen": 4, "ngoc": 4, "tiviati": 4, "sim": 4, "hieu": 4, "dao": 4, "shafiq": 4, "joti": 4, "kenji": 4, "kawaguchi": 4, "nanci": 4, "min": 4, "yen": 4, "kan": 4, "2408": 4, "08656": 4, "out24": 4, "io": 4, "twt": 4, "zhi": 4, "cheng": 4, "kuang": 4, "tsai": 4, "chieh": 4, "hung": 4, "yun": 4, "nung": 4, "02442": 4, "wikipediacontributors24": 4, "wiktionari": 4, "naur_form": 4}, "objects": {}, "objtypes": {}, "objnames": {}, "titleterms": {"introduct": [0, 1, 4], "content": [0, 2, 3, 4], "core": 0, "challeng": 0, "we": 0, "ll": 0, "address": 0, "A": [0, 1], "practic": [0, 1, 4], "approach": 0, "note": 0, "perspect": 0, "who": 0, "thi": 0, "book": 0, "i": 0, "For": 0, "outcom": 0, "prerequisit": 0, "set": 0, "up": 0, "your": 0, "environ": 0, "python": 0, "setup": 0, "api": [0, 4], "kei": [0, 2, 3], "configur": 0, "code": 0, "repositori": 0, "troubleshoot": 0, "common": 0, "issu": 0, "about": 0, "author": 0, "": 0, "tame": 1, "llm": [1, 2], "guid": 1, "pitfal": 1, "open": 1, "sourc": 1, "softwar": [1, 2], "chapter": 1, "1": [1, 3], "2": [1, 3], "wrestl": [1, 4], "structur": [1, 4], "output": [1, 3, 4], "3": [1, 3], "input": 1, "size": [1, 3], "length": [1, 3], "limit": [1, 3], "4": [1, 3], "5": 1, "The": [1, 2], "eval": [1, 2], "gap": [1, 2], "6": 1, "hallucin": 1, "realiti": 1, "7": 1, "safeti": 1, "concern": 1, "8": 1, "cost": [1, 3], "factor": 1, "9": 1, "break": 1, "free": 1, "from": 1, "cloud": 1, "provid": [1, 4], "appendix": 1, "tool": [1, 2, 4], "resourc": 1, "non": 2, "determinist": 2, "gener": [2, 3], "machin": 2, "temperatur": 2, "sampl": 2, "spectrum": 2, "emerg": 2, "properti": 2, "problem": [2, 3, 4], "statement": [2, 3, 4], "tradit": 2, "v": 2, "design": 2, "applic": 2, "test": 2, "requir": 2, "matrix": 2, "conceptu": 2, "overview": 2, "consider": [2, 3], "metric": 2, "evalu": 2, "task": 2, "model": [2, 3], "base": [2, 3], "human": 2, "benchmark": 2, "leaderboard": 2, "lightev": 2, "mmlu": 2, "econometr": 2, "dataset": 2, "famili": 2, "us": 2, "langchain": [2, 4], "promptfoo": 2, "refer": [2, 3, 4], "what": 3, "ar": 3, "token": 3, "comparison": [3, 4], "across": 3, "chunk": 3, "contextu": 3, "link": 3, "long": 3, "form": 3, "step": 3, "write": 3, "prompt": [3, 4], "templat": 3, "construct": 3, "dynam": 3, "paramet": 3, "report": 3, "exampl": 3, "usag": 3, "discuss": [3, 4], "implic": 3, "futur": 3, "conclus": [3, 4], "user": 4, "need": 4, "solut": 4, "strategi": 4, "techniqu": 4, "One": 4, "shot": 4, "specif": 4, "json": 4, "mode": 4, "outlin": 4, "ollama": 4, "compar": 4, "framework": 4, "best": 4, "research": 4, "ongo": 4, "debat": 4, "acknowledg": 4}, "envversion": {"sphinx.domains.c": 2, "sphinx.domains.changeset": 1, "sphinx.domains.citation": 1, "sphinx.domains.cpp": 8, "sphinx.domains.index": 1, "sphinx.domains.javascript": 2, "sphinx.domains.math": 2, "sphinx.domains.python": 3, "sphinx.domains.rst": 2, "sphinx.domains.std": 2, "sphinx.ext.intersphinx": 1, "sphinxcontrib.bibtex": 9, "sphinx": 57}, "alltitles": {"Introduction": [[0, "introduction"], [4, "introduction"]], "Contents": [[0, "contents"], [2, "contents"], [3, "contents"], [4, "contents"]], "Core Challenges We\u2019ll Address": [[0, "core-challenges-we-ll-address"]], "A Practical Approach": [[0, "a-practical-approach"]], "A Note on Perspective": [[0, "a-note-on-perspective"]], "Who This Book Is For": [[0, "who-this-book-is-for"]], "Outcomes": [[0, "outcomes"]], "Prerequisites": [[0, "prerequisites"]], "Setting Up Your Environment": [[0, "setting-up-your-environment"]], "Python Environment Setup": [[0, "python-environment-setup"]], "API Keys Configuration": [[0, "api-keys-configuration"]], "Code Repository": [[0, "code-repository"]], "Troubleshooting Common Issues": [[0, "troubleshooting-common-issues"]], "About the Author(s)": [[0, "about-the-author-s"]], "Taming LLMs": [[1, "taming-llms"]], "A Practical Guide to LLM Pitfalls with Open Source Software": [[1, "a-practical-guide-to-llm-pitfalls-with-open-source-software"]], "Chapter 1: Introduction": [[1, "chapter-1-introduction"]], "Chapter 2: Wrestling with Structured Output": [[1, "chapter-2-wrestling-with-structured-output"]], "Chapter 3: Input Size and Length Limitations": [[1, "chapter-3-input-size-and-length-limitations"]], "Chapter 4: Output Size and Length Limitations": [[1, "chapter-4-output-size-and-length-limitations"]], "Chapter 5: The Evals Gap": [[1, "chapter-5-the-evals-gap"]], "Chapter 6: Hallucination: The Reality Gap": [[1, "chapter-6-hallucination-the-reality-gap"]], "Chapter 7: Safety Concerns": [[1, "chapter-7-safety-concerns"]], "Chapter 8: The Cost Factor": [[1, "chapter-8-the-cost-factor"]], "Chapter 9: Breaking Free from Cloud Providers": [[1, "chapter-9-breaking-free-from-cloud-providers"]], "Appendix A: Tools and Resources": [[1, "appendix-a-tools-and-resources"]], "The Evals Gap": [[2, "the-evals-gap"]], "Non-Deterministic Generative Machines": [[2, "non-deterministic-generative-machines"]], "Temperature and Sampling": [[2, "temperature-and-sampling"]], "The Temperature Spectrum": [[2, "the-temperature-spectrum"]], "Emerging Properties": [[2, "emerging-properties"]], "Problem Statement": [[2, "problem-statement"], [3, "problem-statement"], [4, "problem-statement"]], "Evals of Traditional Software vs LLMs": [[2, "evals-table"]], "Evals Design": [[2, "evals-design"]], "LLM Application Testing Requirements Matrix": [[2, "validation-requirements"]], "Conceptual Overview": [[2, "conceptual-overview"]], "Design Considerations": [[2, "design-considerations"]], "Metrics": [[2, "metrics"]], "Key Metrics for Evaluating Generative Tasks": [[2, "key-metrics"]], "Evaluators": [[2, "evaluators"]], "Model-Based Evaluation": [[2, "model-based-evaluation"]], "Human-Based Evaluation": [[2, "human-based-evaluation"]], "Evaluating Evaluators": [[2, "evaluating-evaluators"]], "Benchmarks and Leaderboards": [[2, "benchmarks-and-leaderboards"]], "Tools": [[2, "tools"]], "LightEval": [[2, "lighteval"]], "MMLU Econometrics Task Dataset sample": [[2, "mmlu-econometrics"]], "Model Families Evaluated Using LightEval": [[2, "model-families"]], "LangChain": [[2, "langchain"], [4, "langchain"]], "PromptFoo": [[2, "promptfoo"]], "References": [[2, "references"], [3, "references"], [4, "references"]], "Output Size Limitations": [[3, "output-size-limitations"]], "What are Token Limits?": [[3, "what-are-token-limits"]], "Token Cost and Length Limitation Comparison Across Key Models": [[3, "token-cost-table"]], "Content Chunking with Contextual Linking": [[3, "content-chunking-with-contextual-linking"]], "Generating long-form content": [[3, "generating-long-form-content"]], "Step 1: Chunking the Content": [[3, "step-1-chunking-the-content"]], "Step 2: Writing the Base Prompt Template": [[3, "step-2-writing-the-base-prompt-template"]], "Step 3: Constructing Dynamic Prompt Parameters": [[3, "step-3-constructing-dynamic-prompt-parameters"]], "Step 4: Generating the Report": [[3, "step-4-generating-the-report"]], "Example Usage": [[3, "example-usage"]], "Discussion": [[3, "discussion"], [4, "discussion"]], "Implications": [[3, "implications"]], "Future Considerations": [[3, "future-considerations"]], "Conclusion": [[3, "conclusion"], [4, "conclusion"]], "Wrestling with Structured Output": [[4, "wrestling-with-structured-output"]], "User Needs": [[4, "user-needs"]], "Solutions": [[4, "solutions"]], "Strategies": [[4, "strategies"]], "Techniques and Tools": [[4, "techniques-and-tools"]], "One-Shot Prompts": [[4, "one-shot-prompts"]], "Structured Output with Provider-Specific APIs": [[4, "structured-output-with-provider-specific-apis"]], "JSON Mode": [[4, "json-mode"]], "Outlines": [[4, "outlines"]], "Ollama": [[4, "ollama"]], "Comparing Solutions": [[4, "comparing-solutions"]], "Structured Output Frameworks Comparison": [[4, "structured-output-frameworks"]], "Best Practices": [[4, "best-practices"]], "Research and Ongoing Debate": [[4, "research-and-ongoing-debate"]], "Acknowledgements": [[4, "acknowledgements"]]}, "indexentries": {}}) \ No newline at end of file diff --git a/tamingllms/_build/jupyter_execute/markdown/intro.ipynb b/tamingllms/_build/jupyter_execute/markdown/intro.ipynb index 2b14961..17f271d 100644 --- a/tamingllms/_build/jupyter_execute/markdown/intro.ipynb +++ b/tamingllms/_build/jupyter_execute/markdown/intro.ipynb @@ -2,7 +2,7 @@ "cells": [ { "cell_type": "markdown", - "id": "bea94820", + "id": "05f22589", "metadata": {}, "source": [ "(intro)=\n", diff --git a/tamingllms/_build/jupyter_execute/notebooks/evals.ipynb b/tamingllms/_build/jupyter_execute/notebooks/evals.ipynb index 5bddf1f..83ddad1 100644 --- a/tamingllms/_build/jupyter_execute/notebooks/evals.ipynb +++ b/tamingllms/_build/jupyter_execute/notebooks/evals.ipynb @@ -1242,6 +1242,8 @@ "\n", "These features make the ARC benchmark a unique test of machine intelligence, focusing on the ability to adapt to novelty and solve problems without relying heavily on memorization. This is more aligned with the concept of general intelligence, which emphasizes the ability to learn efficiently and tackle new challenges.\n", "\n", + "The ARC-AGI benchmark remained unbeaten for five years as of December 2024 (a minimum score of 85% is required to win) {cite}`arcprizeresults2024`. While deep learning has significantly advanced in recent years, pure deep learning approaches perform poorly on the ARC-AGI benchmark. This is because traditional deep learning relies on relating new situations to those encountered during training and lacks the ability to adapt or recombine knowledge for entirely new tasks. ARC Prize 2024 spurred the development of novel AGI reasoning techniques, leading to a significant increase in the state-of-the-art score on the ARC-AGI private evaluation set from 33% in 2023 to 55.5% in 2024. A key takeaway is that algorithmic improvements, rather than massive computational resources, may be key to exceeding the target score for the ARC-AGI benchmark.\n", + "\n", "As language models continue to advance in capability and complexity, evaluation frameworks must evolve. Modern benchmarks increasingly incorporate tests for nuanced reasoning, ethical decision-making, and emergent capabilities that weren't previously measurable. This ongoing evolution reflects a deeper understanding that the true value of language models lies not in achieving high scores on standardized tests with narrow task-specific metrics, but in their ability to meaningfully contribute to human understanding and help solve real-world problems while demonstrating the ability to learn and adapt to new tasks." ] }, diff --git a/tamingllms/notebooks/evals.ipynb b/tamingllms/notebooks/evals.ipynb index 3826fbc..3d8b9a6 100644 --- a/tamingllms/notebooks/evals.ipynb +++ b/tamingllms/notebooks/evals.ipynb @@ -1242,6 +1242,8 @@ "\n", "These features make the ARC benchmark a unique test of machine intelligence, focusing on the ability to adapt to novelty and solve problems without relying heavily on memorization. This is more aligned with the concept of general intelligence, which emphasizes the ability to learn efficiently and tackle new challenges.\n", "\n", + "The ARC-AGI benchmark remained unbeaten for five years as of December 2024 (a minimum score of 85% is required to win) {cite}`arcprizeresults2024`. While deep learning has significantly advanced in recent years, pure deep learning approaches perform poorly on the ARC-AGI benchmark. This is because traditional deep learning relies on relating new situations to those encountered during training and lacks the ability to adapt or recombine knowledge for entirely new tasks. ARC Prize 2024 spurred the development of novel AGI reasoning techniques, leading to a significant increase in the state-of-the-art score on the ARC-AGI private evaluation set from 33% in 2023 to 55.5% in 2024. A key takeaway is that algorithmic improvements, rather than massive computational resources, may be key to exceeding the target score for the ARC-AGI benchmark.\n", + "\n", "As language models continue to advance in capability and complexity, evaluation frameworks must evolve. Modern benchmarks increasingly incorporate tests for nuanced reasoning, ethical decision-making, and emergent capabilities that weren't previously measurable. This ongoing evolution reflects a deeper understanding that the true value of language models lies not in achieving high scores on standardized tests with narrow task-specific metrics, but in their ability to meaningfully contribute to human understanding and help solve real-world problems while demonstrating the ability to learn and adapt to new tasks." ] }, diff --git a/tamingllms/references.bib b/tamingllms/references.bib index 92af41b..023c41b 100644 --- a/tamingllms/references.bib +++ b/tamingllms/references.bib @@ -374,3 +374,12 @@ @misc{arcprize2024 howpublished={ARC Prize Website}, url={https://arcprize.org/}, } + +@misc{arcprizeresults2024, + title={ARC Prize 2024 Results}, + author={Francois Chollet}, + year={12/08/2024}, + howpublished={ARC Prize Website}, + url={https://arcprize.org/2024-results}, +} +
Table 3.1 Structured Output Frameworks Comparison