We ❤️ Open Source
A community education resource
Why grabbing a model and playing won’t get you what you want
Machine learning is a science problem requiring fundamentals before syntax.
Machine learning is not a software problem, it’s a science problem that happens to use code. In this episode, Juan Gomez Lagandera, a Masters student at the time of recording, now pursuing a PhD at Vanderbilt University, joins the We Love Open Source podcast to share why you can’t approach ML like traditional software engineering, how optimization requires understanding math and hardware first, and why getting fundamentals right matters more than syntax.
The key distinction the open source community needs to understand: We use code to solve scientific problems in machine learning. Once you grasp this fundamental difference, you can start talking about what open source tools for researchers actually look like. Think of machine learning as information compression: Putting a really large dataset into a probability distribution so when you have a given set of data, you can predict what the next state would look like.
Read more: The AI slop problem threatening open source maintainers
ML optimization requires a top-down approach rather than bottom-up. First understand what problem you’re trying to solve. If you’re working with a pre-trained model, what’s next?
- Downstream task fine-tuning?
- Model alignment?
- Do you need to go back to the data?
- Back to the math?
- Is it a hardware issue or software issue?
Understanding your need and where you’re starting from is key. You need to be very clear where you want to get to.
If you’re trying to get into ML and AI, you can’t approach it like traditional software engineering. Think about how the math plays a role, how the hardware plays a role. Get good at understanding fundamentals of the problem, not syntax.
Even if you don’t understand libraries like Keras or PyTorch, you can use them to ask questions. But understand how the math works, why models behave in specific ways, why your data has to be distributed in specific ways. You’ll get some results if you just grab a model and start playing, but you won’t get what you want if you don’t understand how things run and why certain building pillars in your architecture hinder development.
Key takeaways
- Machine learning is a science problem, not a software problem: Understanding this distinction changes your approach. We use code to solve scientific problems involving math, hardware, and probability distributions.
- ML optimization requires a top-down approach: Understand what problem you’re solving first. Downstream task fine-tuning or model alignment? Hardware or software issue? Know where you’re starting and where you want to go.
- Understand fundamentals before syntax: Don’t just grab libraries like Keras or PyTorch and start playing. Understand how the math works, why models behave in specific ways, and why data distributions matter before you’ll get the results you want.
Juan’s message: Get the fundamentals right in math and hardware before diving into ML, because it’s a science problem requiring understanding why things work, not just syntax.
More from We Love Open Source
- Spawning parallel AI agents with git subtrees and meta prompts
- Meet Goose: The open source AI agent built for developers
- The AI slop problem threatening open source maintainers
- Why 1.3 billion people depend on progress, not perfection
- 5 forces driving DevOps and AI in 2026
The opinions expressed on this website are those of each author, not of the author's employer or All Things Open/We Love Open Source.