11 comments

  • TheTaytay 1 hour ago
    Woah, this is really neat. My first step for many new libraries is to clone the repo, launch Claude code, and ask it to write good documentation for me. This would save a lot of steps for me!
  • manofmanysmiles 2 hours ago
    I love it! I effectively achieve similar results by asking Cursor lots of questions!

    Like at least one other person in the comments mentioned, I would like a slightly different tone.

    Perhaps good feature would be a "style template", that can be chosen to match your preferred writing style.

    I may submit a PR though not if it takes a lot of time.

    • zh2408 2 hours ago
      Thanks—would really appreciate your PR!
  • lionturtle 10 minutes ago
    >:( :3
  • chairhairair 3 hours ago
    A company (mutable ai) was acquired by Google last year for essentially doing this but outputting a wiki instead of a tutorial.
  • CalChris 1 hour ago
    Do one for LLVM and I'll definitely look at it.
  • badmonster 5 hours ago
    do you have plans to expand this to include more advanced topics like architecture-level reasoning, refactoring patterns, or onboarding workflows for large-scale repositories?
    • zh2408 5 hours ago
      Yes! This is an initial prototype. Good to see the interest, and I'm considering digging deeper by creating more tailored tutorials for different types of projects. E.g., if we know it's web dev, we could generate tutorials based more on request flows, API endpoints, database interactions, etc. If we know it's a more long-term maintained projects, we can focus on identifying refactoring patterns.
      • kristopolous 5 hours ago
        Have you ever seen komment.ai? Is so did you have any issues with the limitation of the product?

        I haven't used it, but it looks like it's in the same space and I've been curious about it for a while.

        I've tried my own homebrew solutions, creating embedding databases by having something like aider or simonw's llm make an ingests json from every function, then using it as a rag in qdrant to do an architecture document, then using that to do contextual inline function commenting and make a doxygen then using all of that once again as an mcp with playwright to hook that up through roo.

        It's a weird pipeline and it's been ok, not great but ok.

        I'm looking into perplexica as part of the chain, mostly as a negation tool

        • zh2408 5 hours ago
          No, I haven't, but I will check it out!

          One thing to note is that the tutorial generation depends largely on Gemini 2.5 Pro. Its code understanding ability is very good, combined with its large 1M context window for a holistic understanding of the code. This leads to very satisfactory tutorial results.

          However, Gemini 2.5 Pro was released just late last month. Since Komment.ai launched earlier this year, I don't think models at that time could generate results of that quality.

          • kristopolous 4 hours ago
            I've been using llama 4 Maverick through openrouter. Gemini was my go to but I switched basically the day it came out to try it out.

            I haven't switched back. At least for my use cases it's been meeting my expectations.

            I haven't tried Microsoft's new 1.58 bit model but it may be a great swap out for sentencellm, the legendary all-MiniLM-L6-v2.

            I found that if I'm unfamiliar with the knowledge domain I'm mostly using AI but then as I dive in the ratio of AI to human changes to the point where it's AI at 0 and it's all human.

            Basically AI wins at day 1 but isn't any better at day 50. If this can change then it's the next step

            • zh2408 4 hours ago
              Yeah, I'd recommend trying Gemini 2.5 Pro. I know early Gemini weren't great, but the recent one is really impressive in terms of coding ability. This project is kind of designed around the recent breakthrough.
              • kristopolous 1 hour ago
                I've used it, I used to be a huge booster! Give llama 4 maverick a try, really.
  • Retr0id 3 hours ago
    The overview diagrams it creates are pretty interesting, but the tone/style of the AI-generated text is insufferable to me - e.g. https://the-pocket.github.io/Tutorial-Codebase-Knowledge/Req...
  • dahuangf 18 minutes ago
    [dead]
  • MoonieSzzS 5 hours ago
    [dead]
  • throwaway314155 1 hour ago
    I suppose I'm just a little bit bothered by your saying you "built an AI" when all the heavy lifting is done by a pretrained LLM. Saying you made an AI-based program or hell, even saying you made an AI agent, would be more genuine than saying you "built an AI" which is such an all-encompassing thing that I don't even know what it means. At the very least it should imply use of some sort of training via gradient descent though.
    • j45 7 minutes ago
      It is an application of AI which is just software, and applying it to solve a problem or need.
  • ryao 4 hours ago
    I would find this more interesting if it made tutorials out if the Linux, LLVM, OpenZFS and FreeBSD codebases.
    • wordofx 4 hours ago
      I would find this comment more interesting if it didn’t dismiss the project just because you didn’t find it valuable.
    • zh2408 3 hours ago
      The Linux repository has ~50M tokens, which goes beyond the 1M token limit for Gemini 2.5 Pro. I think there are two paths forward: (1) decompose the repository into smaller parts (e.g., kernel, shell, file system, etc.), or (2) wait for larger-context models with a 50M+ input limit.
      • rtolsma 2 hours ago
        You can use the AST for some languages to identify modular components that are smaller and can fit into the 1M window