1. Introduction to Git & Version Control
What Is Git & Why Version Control Matters
Git is a distributed version control system that tracks changes to files and coordinates work among multiple developers. It was created by Linus Torvalds in 2005 to manage the Linux kernel development. Today, Git is the most widely used version control system in the world, powering millions of repositories on GitHub, GitLab, and Bitbucket. Understanding Git is not just about learning commands — it's about understanding how modern software development works. Every professional developer uses Git daily, and mastering it is essential for collaboration, code quality, and career growth.
The key insight is that Git is fundamentally a content-addressable filesystem with a VCS interface. Every file and commit is stored as a hash-addressed object. This design makes Git incredibly powerful — you can branch, merge, and rewrite history with confidence because every object is cryptographically identified and immutable. Understanding Git's object model is the key to mastering Git. Unlike centralized version control systems like SVN or CVS, Git is distributed — every clone contains the complete history. This means you can work offline, branch freely, and collaborate without a central server bottleneck.
This module covers the fundamentals of version control, Git's architecture, and why Git is different from other VCS tools. You'll learn about repositories, commits, branches, and the distributed nature of Git. Whether you're a solo developer or part of a large team, understanding these concepts is essential for professional software development. By the end of this module, you'll have a solid mental model of how Git works under the hood, which will make every Git command you learn make intuitive sense.
🏗️ Git's Architecture: The Object Model
Git's architecture is built on four core object types: blobs, trees, commits, and refs. Understanding these objects is the foundation of Git mastery. Blobs store file contents — they're the actual data of your files. Trees store directory structures and point to blobs and other trees — they represent folders and their contents. Commits store metadata about changes: author, date, message, and a pointer to the root tree and parent commits — they represent snapshots of your project at a point in time. Refs are pointers to commits — branches and tags are refs. Understanding these four object types gives you a complete mental model of Git's internal data structure.
Every object in Git is identified by a SHA-1 hash (40 characters). This hash is a cryptographic checksum of the object's content. This means that if the content changes, the hash changes. This design ensures integrity — if a file is corrupted, the hash won't match. This is why Git can trust that history hasn't been tampered with. Understanding this model helps you understand how Git works under the hood. When you commit, Git creates a commit object that points to a tree. The tree points to blobs for files. This creates a directed acyclic graph (DAG) of commits. Each commit points to its parent(s). This graph is the history of your repository. Branches are just pointers to commits in this graph. This model is why Git is so powerful — you can navigate history, create branches, and merge changes with confidence.
Below is an interactive Git Visualizer Sandbox. Click the "Scenario Tutorials" tab to load the "Repository Initialization" scenario, then press Play to watch Git's object model come to life. You'll see blobs, trees, and commits appear as you step through the initialization process. Click any node in the visualizer to inspect its details. This hands-on visualization will make Git's abstract object model concrete and understandable.
When you commit, Git creates a commit object that points to a tree. The tree points to blobs for files. This creates a directed acyclic graph (DAG) of commits. Each commit points to its parent(s). This graph is the history of your repository. Branches are just pointers to commits in this graph. Understanding this model is the key to understanding Git's power. Once you internalize this model, Git commands become intuitive — you understand exactly what each command is doing to the commit graph.
🔄 Version Control Systems: Centralized vs Distributed
To understand Git's importance, it helps to understand the evolution of version control systems. Centralized VCS (like CVS and Subversion) have a single central server that stores all versions. Developers commit to this server, and everyone works against the same central repository. The problem is that if the server goes down or network is unavailable, you can't commit, and you can't see history. You also have a single point of failure — if the server loses data, everyone loses history.
Distributed VCS (like Git) gives every developer a full copy of the repository. When you clone, you get the entire history — all commits, all branches. This means you can work offline, commit locally, and sync when you're ready. There's no single point of failure — every clone is a full backup. This architecture enables workflows that were impossible with centralized systems. You can experiment freely, create local branches, and rewrite history before sharing. This is why Git has become the standard for modern software development.
- Centralized VCS (SVN, CVS): Single server, one source of truth, requires network for most operations.
- Distributed VCS (Git, Mercurial): Every clone is a full repository, no single point of failure, works offline.
- Git's advantage: Fast branching, cheap merging, offline work, and guaranteed integrity.
📝 Step-by-Step: Your First Git Repository
Let's walk through creating your first Git repository and making your first commit. Follow these steps carefully:
Step 1: Create a new directory and initialize Git
Open your terminal and run:mkdir my-first-repocd my-first-repogit init
This creates a new directory and initializes an empty Git repository. You'll see a message like "Initialized empty Git repository in /path/to/my-first-repo/.git/". The .git folder contains all of Git's internal data — this is where Git stores your commits, branches, and configuration. Never delete this folder unless you want to lose your entire repository history.
Step 2: Create your first fileecho "Hello, Git!" > README.md
This creates a file called README.md with the text "Hello, Git!". You can also create files using any text editor. This file is now in your working directory but Git doesn't know about it yet.
Step 3: Check the statusgit status
Git shows you that README.md is an untracked file. This means Git sees the file but isn't tracking it yet. You'll see: "Untracked files: README.md" and a hint to use "git add" to track it.
Step 4: Stage the filegit add README.md
This adds the file to the staging area. The staging area is a temporary holding area where you prepare changes before committing them. Run git status again — you'll see "Changes to be committed: new file: README.md" in green.
Step 5: Commit the filegit commit -m "Add README with greeting"
This creates your first commit. The -m flag specifies the commit message. A good commit message describes what you changed and why. Run git log to see your commit history — you'll see your commit with its unique hash, author, date, and message.
Step 6: Make a change and commit againecho "This is my first project." >> README.mdgit add README.mdgit commit -m "Add project description to README"
You've now made two commits. Run git log --oneline to see a compact view of your commit history. Each commit has a unique identifier (the hash) that you can use to reference it later.
Try It Yourself: Create a file called "notes.txt", add some content, stage it, and commit it. Then use git log to see all three commits in your history.
Knowledge Check
Ready to test your understanding of 1. Introduction to Git & Version Control?