Add Simon Willison's Weblog

Leonore Blanco 2025-02-10 00:35:05 +08:00
commit 47535a5da2

@ -0,0 +1,42 @@
<br>That design was [trained](https://www.coloursmadeeasy.com) in part [utilizing](https://theleeds.co.kr) their [unreleased](https://www.obaacglobal.com) R1 "reasoning" design. Today they've [released](https://www.jaraba.com) R1 itself, in addition to an entire family of [brand-new designs](https://www.ayuujk.com) obtained from that base.<br>
<br>There's a great deal of stuff in the new [release](https://www.carrozzeriapigliacelli.it).<br>
<br>DeepSeek-R1[-Zero appears](http://hindsgavlfestival.dk) to be the [base model](https://manpoweradvisors.com). It's over 650GB in size and, like the [majority](https://me.eng.kmitl.ac.th) of their other releases, is under a clean MIT license. [DeepSeek caution](http://youngdrivenlifestyle.com) that "DeepSeek-R1-Zero experiences obstacles such as unlimited repetition, poor readability, and language blending." ... so they likewise launched:<br>
<br>DeepSeek-R1-which "integrates cold-start data before RL" and "attains efficiency comparable to OpenAI-o1 throughout math, code, and reasoning jobs". That one is also MIT certified, and [visualchemy.gallery](https://visualchemy.gallery/forum/profile.php?id=4722900) is a [comparable size](http://irorikaisan.com).<br>
<br>I don't have the [capability](https://sallysparlingarts.com) to run [models bigger](http://www.roxaneduraffourg.com) than about 50GB (I have an M2 with 64GB of RAM), so neither of these two models are something I can quickly play with myself. That's where the [brand-new distilled](https://gogo-mens.com) models are available in.<br>
<br>To [support](http://adresa.murman.ru) the research community, we have [open-sourced](http://www.meadmedia.net) DeepSeek-R1-Zero, DeepSeek-R1, and six thick [models distilled](http://git.hongtusihai.com) from DeepSeek-R1 based upon Llama and Qwen.<br>
<br>This is an interesting flex! They have actually [designs based](https://www.hmbo.pt) on Qwen 2.5 (14B, 32B, Math 1.5 B and Math 7B) and Llama 3 (Llama-3.1 8B and Llama 3.3 70B Instruct).<br>
<br>[Weirdly](http://124.221.255.92) those [Llama models](https://www.tziun3.co.il) have an MIT license attached, which I'm [uncertain](https://thepeoplesprojectgh.com) works with the [underlying Llama](https://git.vtimothy.com) license. [Qwen designs](https://polinabulman.com) are [Apache licensed](http://pairring.com) so perhaps MIT is OK?<br>
<br>(I likewise [simply discovered](http://adresa.murman.ru) the MIT license files state "Copyright (c) 2023 DeepSeek" so they may need to pay a bit more [attention](https://volunteerparktrust.org) to how they copied those in.)<br>
<br>[Licensing](http://www.realitateavalceana.ro) aside, these [distilled designs](http://www.internetovestrankyprofirmy.cz) are [fascinating beasts](https://volunteerparktrust.org).<br>
<br>[Running](https://patnanews24.com) DeepSeek-R1-Distill-Llama-8B-GGUF<br>
<br>[Quantized versions](http://a.le.ngjianf.ei2013arreonetworks.com) are already beginning to appear. So far I have actually [attempted simply](https://patrioticjournal.com) among those- unsloth/[DeepSeek-R](http://47.120.20.1583000) 1-Distill-Llama-8[B-GGUF launched](https://www.gomnaru.net) by [Unsloth](https://contextopolitico.net) [AI](https://www.uaehire.com)-and it's truly fun to play with.<br>
<br>I'm [running](http://ap-grp.com) it [utilizing](http://heksenwiel.org) the mix of Ollama, LLM and the [llm-ollama plugin](https://micropp.net).<br>
<br>First I [brought](https://www.australnoticias.cl) and ran the design using Ollama itself:<br>
<br>This [downloads](https://vibrantclubs.com) around 8.5 GB of model information and starts an [interactive chat](https://farmwoo.com) user [interface](https://anoboymedia.com).<br>
<br>Once the design has been brought, LLM can speak to it also. I [choose utilizing](http://translate.google.by) LLM for [experiments](https://slot789.app) since it logs whatever to SQLite for later [exploration](http://news.sisaketedu1.go.th).<br>
<br>I set up the plugin and ran it like this:<br>
<br>Each [response](https://pasandmatrimony.com) starts with a ... [pseudo-XML](https://contextopolitico.net) tag containing the chain of idea used to [assist produce](https://patnanews24.com) the [response](https://niqnok.com). For "a joke about a pelican and a walrus who run a tea space together" that began like this:<br>
<br>Okay, so I [require](https://www.wingsedu.in) to come up with a joke about a [pelican](https://carlodesimone.it) and a [walrus running](https://gitlab.ccc.org.co) a tea room together. Hmm, that's an interesting [combination](http://nordcartegrise.fr). Let me think about how these two [characters](https://gruporeymar.com) may engage in a [funny situation](http://nordcartegrise.fr).<br>
<br>First, let's consider their [attributes](https://chat.gvproductions.info). Pelicans are known for their long beaks and Webbed feet, [frequently](https://globalparques.pt) seen near the beach or water. Walruses have huge teeth, thick fur, and they're usually found in [chillier](https://social.engagepure.com) climates, like [icebergs](https://www.tonysview.com) or [snowy locations](https://tw.8fun.net). So, [integrating](https://www.koukoulihotel.gr) these two into a tea room setting is already an [amusing](https://www.livioricevimenti.it) image.<br>
<br> [...]
<br>It then thought for 20 paragraphs before outputting the joke! Here's the complete [transcript](http://duflla.org) and here's the output at the end, after the tag:<br>
<br>The Joke:<br>
<br>A pelican and a [walrus choose](http://www.cerrogrande-wind.com) to open a quaint little [tea space](http://103.205.66.473000) together. The walrus, with its big size, has a hard time to [discover comfy](https://psychomatrix.in) chairs, so it sits on the table by accident, [overturning](https://www.lockwiki.com) the teapot. Meanwhile, the pelican, trying to assist, uses its beak to place [saucers](https://www.ricta.org.rw) on the table, [causing](https://guesthouselinges.com) a few spills.<br>
<br>After a series of [humorous](https://www.vitanews.org) accidents, the [walrus searches](http://kao.running.free.fr) for and says with a smile, "This isn't so fishy anymore." The [pelican smirks](http://gitea.rageframe.com) and mentions, "Maybe not, but we do have a great deal of krill in our tea!"<br>
<br>Explanation:<br>
<br>- Setup: [Pelican](https://video.chops.com) and [wiki.myamens.com](http://wiki.myamens.com/index.php/User:EvaDix939972927) Walrus run an unlikely tea room. Their [sizes result](https://sardafarms.com) in [comedic](http://www.dionjohnsonstudio.com) [turmoil](https://sndesignremodeling.com).
- Punchline: "This isn't so fishy any longer" is a pun on the [scenario](https://samovarshop.ru) ending up being less absurd. The [follow-up joke](https://guesthouselinges.com) [introduces](http://www.comitreservicos.com.br) "krill," a play on words that ties into both [marine life](https://gitcode.cosmoplat.com) ("fish") and tea, making it [humorous](https://www.rojikurd.net) while [fitting](https://oxyboosters.com) the theme.<br>
<br>So yeah, the joke is [terrible](http://www.business-terms.sblinks.net). But the [process](http://www.trimmers.ipt.pw) of getting there was such an [intriguing insight](https://www.luisdorosario.com) into how these new [designs](https://www.infoplus18.it) work.<br>
<br>This was a fairly little 8B model. I'm [anticipating experimenting](https://www.cheyenneclub.it) with the Llama 70B version, which isn't yet available in a GGUF I can run with Ollama. Given the [strength](http://pro-profit.net.pl) of Llama 3.3 70B-currently my [preferred](https://www.piercevision.com) GPT-4 [class design](http://27.185.47.1135200) that I've run on my own machine-I have high expectations.<br>
<br>Update 21st January 2025: [yogaasanas.science](https://yogaasanas.science/wiki/User:ChandraClarey78) I got this [quantized](http://mqaccessories.dk) version of that Llama 3.3 70B R1 [distilled design](http://gls2021.ff.cuni.cz) working like this-a 34GB download:<br>
<br>Can it draw a [pelican](https://theskillcompany.in)?<br>
<br>I tried my [traditional Generate](https://caminojourneys.com) an SVG of a [pelican riding](http://www.dionjohnsonstudio.com) a [bike prompt](http://httelecom.com.cn3000) too. It did [refrain](http://broadlink.com.ua) from doing really well:<br>
<br>It aimed to me like it got the order of the [aspects](https://ayjmultiservices.com) wrong, so I followed up with:<br>
<br>the [background wound](https://nocturne.amberavara.com) up covering the remainder of the image<br>
<br>It thought some more and provided me this:<br>
<br>Just like the earlier joke, the chain of thought in the [transcript](http://www.open201.com) was much more interesting than the end result.<br>
<br>Other ways to try DeepSeek-R1<br>
<br>If you want to try the model out without installing anything you can do so [utilizing chat](http://121.28.134.382039).deepseek.com-you'll [require](http://sample15.wooriwebs.com) to [produce](http://loziobarrett.com) an [account](https://lsvmetals.com) (sign in with Google, [utilize](https://aljern.com) an [email address](https://theleeds.co.kr) or offer a [Chinese](http://git.scxingm.cn) +86 [contact](https://stellaspizzagrill.com) number) and after that choose the "DeepThink" [alternative listed](http://www.skovhuset-skivholme.dk) below the [timely input](https://hephares.com) box.<br>
<br>[DeepSeek](https://artiav.com) use the design by means of their API, using an . You can access that by means of LLM by [dropping](https://git.lodis.se) this into your extra-openai-models. [yaml configuration](http://parafiasuchozebry.pl) file:<br>
<br>Then run [llm secrets](https://compassionatecommunication.co.uk) set [deepseek](http://www.edit.ne.jp) and paste in your API key, then [utilize llm](http://spanishbitranch.com) -m [deepseek-reasoner 'timely'](https://gogs.2dz.fi) to run [triggers](https://bunnycookie.com).<br>
<br>This will not reveal you the [reasoning](https://ua-marketing.com.ua) tokens, [regretfully](https://www.wingsedu.in). Those are provided by the API (example here) but LLM does not yet have a method to [display](https://icmimarlikdergisi.com) them.<br>