• stratoscaster@lemmy.world
    link
    fedilink
    arrow-up
    15
    arrow-down
    1
    ·
    8 months ago

    It literally is just statistics… wtf are you on about. It’s all just weights and matrix multiplication and tokenization

    • Redex@lemmy.world
      link
      fedilink
      arrow-up
      4
      arrow-down
      6
      ·
      8 months ago

      Well on one hand yes, when you’re training it your telling it to try and mimic the input as close as possible. But the result is still weights that aren’t gonna reproducte everything exactly the same as it just isn’t possible to store everything in the limited amount of entropy weights provide.

      In the end, human brains aren’t that dissimilar, we also just have some weights and parameters (neurons, how sensitive they are and how many inputs they have) that then output something.

      I’m not convinced that in principle this is that far from how human brains could work (they have a lot of minute differences but the end result is the same), I think that a sufficiently large, well trained and configured model would be able to work like a human brain.

    • General_Effort@lemmy.world
      link
      fedilink
      arrow-up
      1
      arrow-down
      4
      ·
      8 months ago

      It’s all just weights and matrix multiplication and tokenization

      See, none of these is statistics, as such.

      Weights is maybe closest but they are supposed to represent the strength of a neural connection. This is originally inspired by neurobiology.

      Matrix multiplication is linear algebra and encountered in lots of contexts.

      Tokenization is a thing from NLP. It’s not what one would call a statistical method.

      So you can see where my advice comes from.

      Certainly there is nothing here that implies any kind of averaging going on.

        • General_Effort@lemmy.world
          link
          fedilink
          arrow-up
          1
          arrow-down
          1
          ·
          8 months ago

          Why would averaging lead to repetition of stereotypes?

          Anyway, it’s hard to say LLMs output what they do. GPTisms may have to do with the system prompt or they may result from the fine-tuning. Either way, they don’t seem very internet average to me.

          • Natanael@slrpnk.net
            link
            fedilink
            arrow-up
            2
            ·
            edit-2
            8 months ago

            The TLDR is that pathways between nodes corresponding to frequently seen patterns (stereotypical sentences) gets strengthened more than others and therefore it becomes more likely that this pathway gets activated over others when giving the model a prompt. These strengths correspond to probabilities.

            Have you seen how often they’ll sign a requested text with a name placeholder? Have you seen the typical grammar they use? The way they write is a hybridization of the most common types of texts it has seen in samples, weighted by occurrence (which is a statistical property).

            It’s like how mixing dog breeds often results in something which doesn’t look exactly like either breed but which has features from every breed. GPT/LLM models mix in stuff like academic writing, redditisms and stackoverflowisms, quoraisms, linkedin-postings, etc. You get this specific dryish text full of hedging language and mixed types of formalisms, a certain answer structure, etc.

              • Natanael@slrpnk.net
                link
                fedilink
                arrow-up
                2
                ·
                8 months ago

                A) I’ve not yet seen evidence to the contrary

                B) you do know there’s a lot of different definitions of average, right? The centerpoint of multiple vectors is one kind of average. The median of online writing is an average. The most common vocabulary, the most common sentence structure, the most common formulation of replies, etc, those all form averages within their respective problem spaces. It displays these properties because it has seen them so often in samples, and then it blends them.

                • General_Effort@lemmy.world
                  link
                  fedilink
                  arrow-up
                  1
                  arrow-down
                  1
                  ·
                  8 months ago

                  I accidentally clicked reply, sorry.

                  B) you do know there’s a lot of different definitions of average, right?

                  I don’t think that any definition applies to this. But I’m no expert on averages. In any case, the training data is not representative of the internet or anything. It’s also not training equally on all data and not only on such text. What you get out is not representative of anything.

                  • Natanael@slrpnk.net
                    link
                    fedilink
                    arrow-up
                    2
                    ·
                    8 months ago

                    You don’t need it to be an average of the real world to be an average. I can calculate as many average values as I want from entirely fictional worlds. It’s still a type of model which favors what it sees often over what it sees rarely. That’s a form of probability embedded, corresponding to a form of average.

                • General_Effort@lemmy.world
                  link
                  fedilink
                  arrow-up
                  1
                  arrow-down
                  1
                  ·
                  8 months ago

                  A) I’ve not yet seen evidence to the contrary

                  You should worry more about whether you have seen evidence that supports what you are saying. So, what kind of evidence do you want? A tutorial on coding neural nets? The math? Video or text?

                  • Natanael@slrpnk.net
                    link
                    fedilink
                    arrow-up
                    2
                    ·
                    8 months ago

                    Text explaining why the neural network representation of common features (typically with weighted proportionality to their occurrence) does not meet the definition of a mathematical average. Does it not favor common response patterns?