Introduction to SQL & Databases (A Step-by-Step Guide)

Dataquest
15 min readJun 15, 2024

This is the first post in a multi-part series designed to help you learn SQL in just six weeks! Whether you’re a complete beginner or looking to refresh your skills, this series will provide you with the knowledge and practical experience needed to master SQL.

I’m Casey Bates, director of course development at Dataquest, and in this post I want to talk to you about learning SQL.When people ask me if they should learn Python or R first as an aspiring data professional, I always recommend starting with SQL instead. It might not be as flashy as the latest machine learning libraries, but SQL is the foundation that all data work is built on. I speak from experience–SQL was the first topic I taught as a data science instructor, even though my expertise was in R.

SQL is the quiet workhorse that runs everything in the data world, from small business databases to massive big tech applications. At Dataquest, we use SQL every single day to monitor the quality of our courses and quickly diagnose and fix any issues that arise. Having SQL skills prepares you to take on any data challenge that comes your way, big or small.

In my years working with data, I’ve collaborated with dozens of analysts, scientists and engineers. And you know what skill they all had in common? SQL. When it comes to querying databases, sharing analyses with your team, and constructing data pipelines, SQL is the universal language that connects data professionals.

I still remember how overwhelming it felt when I first started learning SQL. It was like staring out at a vast ocean of data without a clear starting point. But as I began writing queries and working with real datasets, I quickly realized just how powerful SQL is for asking the right questions and surfacing actionable insights. Knowing SQL has helped me make a real difference in my work by enabling data-driven decisions that keep our team focused.

Learning SQL from the ground up gives you a solid understanding of databases, tables, queries, joins, and more–concepts you’ll use again and again as you progress in your data career. It’s a skill that will serve you well no matter what domain or industry you end up working in. For an in-depth exploration of why and how to learn SQL, check out this insightful post.

When it comes to learning SQL, the journey begins with understanding the basics and progressively moving towards more complex concepts. In this post, I will guide you through a structured approach to learning how to write basic SQL queries and understand database structures.

Exploring the Database and Schema

Have you ever found yourself staring at a massive spreadsheet, eyes glazing over as you scroll past endless rows and columns? I know I have. When working with large datasets, it’s easy to feel overwhelmed.

That’s where databases come in. They allow us to store and organize huge amounts of data in a structured way, just like a well-designed spreadsheet. But instead of one giant table, a database contains multiple tables that are related to each other. It’s like having separate sheets for different types of data, but with the ability to link them together.

To actually retrieve data from a database, we use a language called SQL. Think of it as a precise set of instructions telling the database what information you want to see. It’s like raising your hand and saying, “Hey database, can you show me just the sales numbers for last month?” And the database responds by handing you a neatly formatted result.

Let’s take a look at an example SQL query:

SELECT *
FROM orders
LIMIT 5;

This query is basically saying, “Select all columns from the orders table, but only give me back the first 5 rows.” The LIMIT clause is helpful when you're exploring a new database and just want to get a sense of what the data looks like without being bombarded with thousands of results. Here’s what the output looks like:

As you can see, there are only 5 rows. There are actually more columns, as indicated by the dots ( ...), but we’ve simplified the output for now.

When I first started working with databases, I found it helpful to think of them like a library. Each table is like a different section — fiction, non-fiction, mystery, sci-fi. And SQL is like the friendly librarian who helps you find exactly the book you’re looking for, even if you only know the general topic or author’s name.

Later in this post, we’ll learn how to explore individual tables and columns within a database. But for now, just remember that databases give structure to our data, while SQL is the key to extracting the insights hidden inside. With these tools in your kit, that overwhelming spreadsheet will start to feel more manageable.

Coming up, we’ll look at how we can understand the structure of our data.

Exploring Tables and Columns

With SQL as your guide, you can explore and make sense of even the most complex databases. The key is understanding your database’s schema — the blueprint that shows how all the pieces fit together.

Exploring a database schema is important for writing effective queries and uncovering insights. Once you know what information is available and how it’s organized, you can ask the right questions to extract value from your data. At Dataquest, we use SQL to explore our course database schema and identify areas for improvement, like the relationship between lesson completion and code attempts.

To explore a schema, start by understanding how databases use tables (like spreadsheets) to organize information in rows and columns. Each column has a specific data type, such as:

  • TEXT for strings
  • INTEGER for whole numbers
  • REAL for decimals

SQL provides a handy command to retrieve metadata about a table’s structure:

PRAGMA table_info(orders);

This command returns one row for each column in the “orders” table, showing the column name, data type, and other useful metadata. Another thing you can do is retrieve just the data type for a specific column, like this:

Another thing you can do is retrieve just the data type for a specific column, like this:

SELECT name, type
FROM pragma_table_info('orders')
WHERE name = 'sales';

This shows the column name and data type for the ‘sales’ column in the ‘orders’ table:

name type sales REAL

Knowing the schema helps you write efficient queries, avoid errors, and understand relationships between tables.

Exploring the schema is the first key step in the SQL workflow. With practice, navigating database structures will become second nature. In the next section, we’ll use this knowledge to explore how we extract insights from databases.

Filtering with Numbers

When getting started with SQL, one of the biggest challenges with working with large amounts of data is fining the information you’re looking for. This is where filtering with numbers comes in. By applying comparison operators, conditional statements, and other criteria to the numbers in your data, you can precisely target the records you need to answer key business questions.

We use numeric filters at Dataquest every week to zero in on specific course metrics. For example, by analyzing screens with high abandonment rates, we can quickly identify lessons that may be too challenging or have bugs that need fixing. This targeted approach helps us make data-informed decisions to improve the learning experience.

So how do these numeric filters work? Let’s break it down:

  • Comparison operators like <, >, and = check the relationship between quantities
  • BETWEEN finds values within a consecutive range, like order totals from $100 to $500
  • IN checks for values in a non-consecutive list, which is handy for analyzing unique scenarios:
SELECT order_id, product_name, sales, discount
FROM orders
WHERE discount IN (0.15, 0.32, 0.45);

Here’s what we see in the first five rows:

Some other commands I use a lot are:

  • AND/OR combine multiple criteria to refine your results
  • IS NULL identifies missing values that may need investigation

You can also use numeric filters to find outliers and potential issues, like products with negative profits or suspiciously low quantities:

SELECT product_name, profit, quantity
FROM orders
WHERE profit < -100 OR quantity = 1;

And here’s what we see:

The key is to let your curiosity guide you. The more you explore your data using these techniques, the more insights you’ll uncover. Numeric filters are great for answering probing questions that can lead to real impact.

Filtering numbers is just the beginning though. Next, we’ll look at how to analyze text data to take your SQL skills even further. Let’s read on.

Filtering with Strings and Categories

In the last section, we saw how numeric filters help us hone in on specific quantitative criteria in our data. But what about all the text columns in our database? Product names, customer emails, order IDs — these fields often hold valuable qualitative insights. That’s where filtering with strings and categories comes in.

We use text filters at Dataquest to track different types of blog posts. By searching for naming patterns with the LIKE operator and wildcards, we can quickly keep track of how many new blog posts we’ve published, and how many existing ones we’ve updated.

Let’s break down how text and categorical filters work:

  • SELECT DISTINCT returns a list of unique text values in a column, like all product categories
  • IN checks for membership in a list of specific text values, while NOT IN excludes them
  • LIKE enables fuzzy searching for patterns using wildcards:
  • % matches any number of characters
  • _ matches a single character
SELECT DISTINCT ship_mode, state
FROM orders
WHERE state IN ('District of Columbia', 'North Dakota', 'Vermont', 'West Virginia');

Here’s what the output looks like:

We can also combine LIKE with wildcards to search for specific text patterns. Imagine we have a damaged product label that reads "Pr___ C_l__ed ___cils". We can find potential matches like this:

SELECT DISTINCT product_name
FROM orders
WHERE product_name LIKE 'Pr___ C_l__ed %';

This query looks for product names starting with “Pr”, followed by any 3 characters, then “ C” followed by any single character, then “l”, then any 2 characters, then “ed “, and finally any number of characters after that.

Here’s the result:

By creatively combining these techniques, you can filter your data in powerful ways to uncover valuable subsets and patterns. The key is to think critically about what text data might hold the answers you’re looking for. Also, build your queries iteratively so that you can evaluate what’s happening after each step.

Coming up, we’ll explore how to sort query results to further refine our data and surface the most relevant insights. Filtering and sorting go hand-in-hand to give you maximum control over your data. Keep reading to learn more!

Sorting Results

In the last section, we saw how text and categorical filters allow us to segment our data in powerful ways. But sometimes, we need to go a step further and sort those segments to surface the most relevant insights. That’s where the ORDER BY clause comes in.

I use sorting in SQL all the time to evaluate the performance of our lessons at Dataquest. By ordering lesson screens by completion rate from lowest to highest, I can quickly identify which ones might be too challenging or confusing for learners. This helps me prioritize where to focus my optimization efforts.

So how does sorting work under the hood? Let’s break it down:

  • ORDER BY sorts query results by the values in one or more specified fields
  • By default, sorting is done in ascending order (low to high for numbers, A to Z for text)
  • Adding the DESC keyword after a field name sorts it in descending order instead
  • You can sort by multiple fields in different orders to fine-tune your results

For example, let’s say we want to find the most profitable orders in our database:

SELECT order_id, product_name, profit
FROM orders
ORDER BY profit DESC;

Here’s what the output might look like:

We can see that the query sorted the results by the “profit” column from highest to lowest, thanks to the DESC keyword.

But ORDER BY becomes even more powerful when combined with other clauses like WHERE and LIMIT. In the query order of execution, ORDER BY comes after WHERE but before LIMIT. This means we can filter our data first, then sort the remaining results, and finally limit the output to just the top records we need.

Imagine your boss asks you to find the 10 largest orders by quantity for the Central region’s Office Supplies category. Here’s how you might do it:

SELECT order_id, quantity
FROM orders
WHERE region = 'Central' AND category = 'Office Supplies'
ORDER BY quantity DESC
LIMIT 5;

And here’s the result:

By combining a WHERE filter, ORDER BY sorting, and LIMIT, we narrowed down over 9000 records to just the top 10 we needed in a single query. And this is just scratching the surface of what you can do!

The beauty of ORDER BY is how it builds on the other SQL concepts we've learned to give you incredible control over your data. Whenever you're faced with a data question, think about how sorting might help you find the answer faster.

Next up, we’ll explore how to use conditional logic in SQL to segment and analyze our data in even more granular ways. Keep reading, because things are about to get even more interesting.

Conditional Statements and Style

Now let’s learn about how we can transform your data on the fly, categorizing or sorting it in ways that go beyond basic filtering. Conditional logic and CASE expressions are often used for this purpose.

Recall the example from earlier where I mentioned how we use SQL to keep track of new and updated blog posts. Our query for this actually includes a WHEN ...THEN statement, like this:

WHEN blog_name LIKE '%post%' OR blog_name LIKE '%Post%' THEN 'New'

This categorizes any blog with ‘post’ or ‘Post’ in the name as a new post. Conditional logic with CASE expressions enables these kinds of powerful data transformations.

Let’s take a look at how CASE expressions work:

  • CASE uses WHEN, THEN, ELSE to specify conditions and outcomes
  • You can create new columns or sort results based on these conditions
  • Leaving out the ELSE clause results in missing values for any unmet conditions

Some common use cases for CASE expressions include:

  • Binning numeric values into categories (e.g. grouping sales into small, medium, large buckets)
  • Consolidating multiple text values into a single category
  • Prioritizing certain results to the top or bottom when sorting

Let’s look at a couple examples. Say we want to categorize profit margin for Supplies products in Los Angeles:

SELECT order_id, product_name, profit/sales AS profit_margin,
CASE
WHEN profit/sales > 0.3 THEN 'Great'
WHEN profit/sales < 0.1 THEN 'Terrible'
END AS profit_category
FROM orders
WHERE subcategory = 'Supplies' AND city = 'Los Angeles'
ORDER BY profit_category DESC;

This query creates a new profit_category column that bins the profit_margin calculation into 'Great', 'Terrible', or missing buckets based on the specified thresholds. We can quickly see which products are performing well or poorly.

We can also use CASE expressions in the ORDER BY clause to sort results in a customized, non-alphabetical way. Imagine we want to prioritize the Corporate and Consumer segments to the top of the results:

SELECT segment, subcategory, product_name, sales, profit
FROM orders
WHERE city = 'Watertown'
ORDER BY CASE
WHEN segment = 'Corporate' THEN 1
WHEN segment = 'Consumer' THEN 2
ELSE 3
END;

By assigning ‘ranks’ to each segment in the CASE expression, we can force the results to display Corporate first, Consumer second, and everything else third, regardless of alphabetical order.

As you can see, CASE expressions allow you to transform and analyze your data in ways that go beyond basic filtering and sorting. They're a powerful tool to have in your SQL toolkit.

To close out, let’s briefly touch on the concept of “performant” SQL. When working with large datasets, writing queries that optimize performance becomes important. A few best practices include:

  1. Avoiding SELECT * and only selecting the columns you need
  2. Using LIMIT to sample results instead of returning everything
  3. Favoring IN over OR for compound conditions

By keeping performance in mind from the start, you’ll be able to analyze even the largest datasets with ease.

In the next and final section of this post, we’ll see how we can put everything we’ve learned together in a hands-on project analyzing Kickstarter data.

Guided Project: Analyzing Kickstarter Projects

We’ve covered a lot of ground in this post, from basic SQL queries to more advanced concepts like filtering, sorting, and conditional logic. But the real magic happens when you put all these pieces together to solve a realistic problem. That’s exactly what we’ll do in this final section, with a hands-on project analyzing Kickstarter data.

This Analyzing Kickstarter Projects idea is one project that’s great for folks starting out with SQL. You’ll take on the role of a data analyst at a startup considering launching a Kickstarter campaign to test product viability. The goal is to identify factors that influence the success or failure of campaigns.

So where do you start? The first step is always to examine the structure of your database. You can use a query like this to retrieve the column names and data types:

PRAGMA table_info(ksprojects);

This gives you a quick overview of what data you have to work with and how it’s organized.

Next, you’ll combine filtering, sorting, and CASE expressions to extract insights from the data. For example, let's say you want to analyze failed Kickstarter campaigns with a minimum level of funding and backers to see how close they came to reaching their goals:

SELECT main_category, backers, pledged, goal,
pledged / goal AS pct_pledged,
CASE
WHEN pledged / goal >= 1 THEN "Fully funded"
WHEN pledged / goal BETWEEN .75 AND 1 THEN "Nearly funded"
ELSE "Not nearly funded"
END AS funding_status
FROM ksprojects
WHERE state IN ('failed')
AND backers >= 100 AND pledged >= 20000
ORDER BY main_category, pct_pledged DESC
LIMIT 10;

This query showcases many of the concepts we’ve covered, like:

  • Selecting and aliasing columns
  • Filtering with WHERE and IN
  • Using CASE to create a new column with conditional logic
  • Sorting results with ORDER BY
  • Limiting output with LIMIT

By analyzing the results, you might uncover trends in project categories, funding goals, or backer engagement that could guide your own campaign strategy. You could even use this analysis to build a compelling case study for your data portfolio.

One of my favorite ways to learn is what I call “parallel learning”. After I complete a course or a project, I switch data sources and repeat the general workflow again. Even the smallest differences in a new dataset can create interesting challenges. And there’s no better feeling than figuring things out and sharing your results with your friends or on your portfolio.

The key is to let your curiosity guide you and not be afraid to experiment. The more you practice on real-world datasets, the more comfortable and confident you’ll become with SQL.

Of course, this Kickstarter project is just one example. The beauty of SQL is that it can be applied to any domain, from marketing and finance to healthcare and sports. The skills you’ve learned in this series will serve you well no matter what field you’re in.

Conclusion

When I first started learning SQL, I remember feeling overwhelmed by the sheer volume of data and the complexity of queries. But with each small victory — a successful filter, a well-crafted conditional statement, a revealing insight — I grew more confident and curious. Looking back, I’m amazed at how those fundamental SQL skills have served me during my time at Dataquest. The key insights I’ve gained — the importance of understanding your data, the power of filtering and sorting, the flexibility of conditional logic — continue to guide my work every day.

If you’re feeling inspired to start your own SQL journey, my advice is to try it for yourself. Don’t worry if everything doesn’t click right away — learning SQL is a process, and every challenge is an opportunity to grow. The key is to stay curious, practice regularly, and celebrate your progress along the way. And if you’re looking for a supportive community and engaging resources to keep you motivated, the Dataquest Introduction to SQL and Databases course is a great place to start.

Remember, even the most complex queries start with a simple SELECT statement. With patience, persistence, and a passion for learning, you'll be amazed at what you can achieve with SQL. So start learning SQL today, and remember that you have the support of the data community as you learn and grow.

--

--

Dataquest
0 Followers

We are a team of passionate educators teaching data science.