![](/static/c15a0eb1/assets/icons/icon-96x96.png)
![](https://lemmy.world/pictrs/image/0da8d285-3457-4e5b-af21-b38609b07eea.webp)
Slightly off topic, but the writing on this article is horrible. Optimizing for Google engagement, it seems. Ironically, an AI would probably have produced something vastly more readable.
Slightly off topic, but the writing on this article is horrible. Optimizing for Google engagement, it seems. Ironically, an AI would probably have produced something vastly more readable.
Aww come on. There’s plenty to be mad at Zuckerberg about, but releasing Llama under a semi-permissive license was a massive gift to the world. It gave independent researchers access to a working LLM for the first time. For example, Deepseek got their start messing around with Llama derivatives back in the day (though, to be clear, their MIT-licensed V3 and R1 models are not Llama derivatives).
As for open training data, its a good ideal but I don’t think it’s a realistic possibility for any organization that wants to build a workable LLM. These things use trillions of documents in training, and no matter how hard you try to clean the data, there’s definitely going to be something lawyers can find to sue you over. No organization is going to open themselves up to the liability. And if you gimp your data set, you get a dumb AI that nobody wants to use.
It’s definitely a trend. More and more top Chinese students are also opting to stay in China for university, rather than going to the US or Europe to study. It’s in part due to a good thing, i.e. the improving quality of China’s universities and top companies. But I think it’s a troubling development for China overall. One of China’s strengths over the past few decades has been their people’s eagerness to engage with the outside world, and turning inward will not be beneficial for them in the long run.
Chinese or not, it’s MIT licensed. A world where any company can spend ~$10k to locally deploy a frontier reasoning model is very different from one where you can only get AI via API access to a handful of US tech giants.
Rare earths to begin with. There will be more demands.
Base models are general purpose language models, mainly useful for AI researchers and people who want to build on top of them.
Instruct or chat models are chatbots. They are made by fine-tuning base models.
The V3 models linked by OP are Deepseek’s non-reasoning models, similar to Claude or ChatGPT4o. These are the “normal” chatbots that reply with whatever comes to their mind. Deepseek also has a reasoning model, R1. Such models take time to “think” before supplying their final answer; they tend to give better performance for stuff like math problems, at the cost of being slower to get the answer.
It should be mentioned that you probably won’t be able to run these models yourself unless you have a data center style rig with 4-5 GPUs. The Deepseek V3 and R1 models are chonky beasts. There are smaller “distilled” forms of R1 that are possible to run locally, though.
Going after US tech is an obvious move. Digital services taxes, etc.
“Via Greenland” makes no sense. The trouble with Canada-Europe trade is that Canada unfortunately lacks a good port on its east coast (certainly nothing comparable to Vancouver in the west). For the foreseeable future, if the trade dispute with the US drags on, Canada’s best bet is to expand its trade with Asia.
Intriguingly, there’s reason to believe the R1 distills are nowhere close to their peak performance. In the R1 paper they say that the models are released as proofs of concept of the power of distillation, and the performance can probably be improved by doing an additional reinforcement learning step (like what was done to turn V3 into R1). But they said they basically couldn’t be bothered to do it and are leaving it for the community to try.
2025 is going to be very interesting in this space.
They have no armed forces. Panama always assumed that because of the importance of the canal, in case of external aggression the US will step in to defend them. LOL.
No AI org of any significant size will ever disclose its full training set, and it’s foolish to expect such a standard to be met. There is just too much liability. No matter how clean your data collection procedure is, there’s no way to guarantee the data set with billions of samples won’t contain at least one thing a lawyer could zero in on and drag you into a lawsuit over.
What Deepseek did, which was full disclosure of methods in a scientific paper, release of weights under MIT license, and release of some auxiliary code, is as much as one can expect.
By Taiwanese law, TSMC isn’t allowed to move cutting edge processes to its US plant. The overseas operations have to be at least one gen behind.
From a strategic point of view, it makes sense for the Taiwan government to do this. They don’t want the US to suck them dry then cut a deal with the mainland.
Looking forward to the Dragon Age go-kart racing game.
Looks like Alex Kurtzman has done it again!
Also, the release of R1 under the MIT license means that in principle anyone can use R1 to generate synthetic training sets for improving other (non-reasoning) models. This may be a real game changer.
The one fly in the ointment is that Deepseek didn’t deign to share details of their synthetic data generation procedure. But they are already way more transparent than any other non-academic AI lab, so it’s hard to get mad at them over this.
Try it out for yourself: https://chat.deepseek.com/
It can understand LaTeX as well as outputting it. In my limited testing on sample physics problems, it performs pretty well. It also scored 100% on the 2023 A Level maths exam.
It’s MIT licensed, so anyone is free to go about decensoring it. There are already “abliterated” (decensored) variants uploaded to huggingface, at least for the distilled models.
This procedure also decensors stuff that western models routinely censor. So ironically these Chinese open source models are giving us the most free speech friendly LLMs around.
It’s an interesting subject. If not for Beijing’s heavy hand, could Chinese internet companies have flourished much more and become international tech giants? Maybe, but there is one obvious counterpoint: where are the European tech giants? In an open playing field, it looks like American tech giants are pretty good at buying out or simply crushing any nascent competitors. If the Chinese did not have their censorship or great firewall, maybe the situation would have been like Europe, where the government tries to impose some rules, but doesn’t really have much traction, and everyone just ends up using Google, Amazon, Facebook, etc.
There are no permanent alliances in a multipolar world. The EU is realizing the danger of relying on a stronger partner, when said partner stops thinking of the relationship as an alliance as more as subordination.
Aside from national pride or security, one issue is that there’s a Taiwan law requiring TSMC to keep latest gen fabs in Taiwan. So if TSMC takes over Intel fabs, Intel’s US operations will never be able to reach latest gen (not that Intel is currently in good shape to achieve this, of course).