Sets — Introduction

In short

A set is a collection of distinct objects, treated as a single new object. The only thing a set knows about each object is whether that object is in the set or not in it. You can describe a set by listing its elements (roster form) or by stating a property the elements satisfy (set-builder form). Sets come in standard types — empty, finite, infinite, universal — and one set is a subset of another if every element of the first is also an element of the second. The collection of all subsets of a set is itself a set, called the power set, and every interval on the real number line — open, closed, half-open — is a particular subset of \mathbb{R} written in interval notation.

Look at a basket of fruits. There is a mango in it, two bananas, an apple, and a guava. Now ask a strange-sounding question: which things are in the basket, and which are not?

The list of things in the basket is short: mango, banana, apple, guava. The list of things not in the basket is everything else in the universe — your school bag, the chair you're sitting on, the rings of Saturn, the entire population of Lucknow. The basket is the simplest possible idea, but it does something quietly powerful: it draws a line that divides the world into two halves. Inside and outside. Element of the basket and not an element of the basket.

That is exactly what a set is. A set is a collection of distinct objects, and the only fact about each object that the set cares about is whether the object is in the set or not. There is no order, no count of duplicates, no extra structure. A set is the smallest possible mathematical object that captures the idea of "this thing belongs, and that thing does not."

This article is the working introduction to sets — the language that every later chapter of mathematics is going to use. Sets show up in Number Systems (where you have already seen \mathbb{N}, \mathbb{Z}, \mathbb{Q}, \mathbb{R} as set names without the formal definition), in functions (a function takes one set to another), in probability (an event is a set of outcomes), in geometry (a circle is a set of points), and in essentially every advanced topic. The notation is small, the rules are few, and the payoff is enormous.

What a set is

A set is a collection of distinct objects. The objects are called the elements (or members) of the set. There are two notations and one fundamental relation:

The set itself is written between curly braces: \{1, 2, 3\}, \{a, e, i, o, u\}, \{\text{red}, \text{green}, \text{blue}\}.
The relation "x is an element of A" is written x \in A, with the symbol "\in" (an enlarged Greek letter epsilon, read as "in" or "is an element of"). The negation, "x is not an element of A," is written x \notin A.

So if A = \{1, 2, 3\}, then 1 \in A (true), 2 \in A (true), and 5 \notin A (true). The relation \in is the most important symbol in the language of sets — almost every other definition is built on top of it.

Two facts about set notation that often catch beginners by surprise:

Order doesn't matter. \{1, 2, 3\} and \{3, 2, 1\} and \{2, 1, 3\} are the same set. There is no "first element" of a set, no "third element"; a set is just a collection of memberships, not a sequence.
Repetition doesn't matter. \{1, 1, 2, 3\} is the same set as \{1, 2, 3\}. The set knows that 1 is in it; saying "1 is in it" twice does not put 1 in twice. Sets contain distinct objects.

These two conventions might look strange after a lifetime of dealing with lists (where order and repetition matter very much), but they are exactly what makes sets useful as a tool for thought. A set captures which things are in the collection, nothing more, nothing less. If you also need to track order or repetition, you use a different mathematical object — a tuple or a sequence — but for the questions sets are designed to answer, the stripped-down version is the right one.

A set $A$ as an inside-outside boundary. Anything inside the boundary is an element of $A$ (so $\text{mango} \in A$, $\text{banana} \in A$, and so on); anything outside is not (so $\text{bat} \notin A$, $\text{Saturn} \notin A$). The set is the boundary itself — the rule that decides what is in and what is out.

Two ways to describe a set

There are two standard notations for writing down what's in a set, and you will see both constantly.

Roster form (also called listing form). Just list the elements between curly braces, separated by commas:

A = \{1, 2, 3, 4, 5\}

B = \{a, e, i, o, u\}

C = \{\text{red}, \text{green}, \text{blue}\}

This is the most direct notation, and it works whenever you can write all the elements down. For finite sets with a small number of elements, roster form is unbeatable.

For infinite sets — or for sets where the list of elements is too long to write — roster form is impractical. You can fudge it for some "obvious" infinite sets by using "\dots" to indicate the pattern: \mathbb{N} = \{1, 2, 3, 4, \dots\}, or \{2, 4, 6, 8, \dots\} for the positive even numbers. But the fudge is only as clear as the pattern it suggests, and it doesn't work for sets where there is no obvious next element.

Set-builder form. The cleaner notation for sets that can't be listed is to describe the elements by a property they satisfy:

A = \{x \mid x \text{ is a positive even integer less than } 10\}

The vertical bar "\mid" is read "such that" (some textbooks use a colon instead of a bar — both are correct). The expression on the left of the bar names a typical element, and the expression on the right gives the rule that decides which elements are in. So the set above unfolds as: "the set of all x such that x is a positive even integer less than 10" — which is \{2, 4, 6, 8\}.

A few common set-builder examples:

\{x \in \mathbb{R} \mid x > 0\} — the set of positive real numbers
\{x \in \mathbb{Z} \mid x^2 < 20\} — the set of integers whose square is less than 20, which is \{-4, -3, -2, -1, 0, 1, 2, 3, 4\}
\{x \mid x \text{ is a prime number}\} — the set of all primes

Set-builder form is the right tool whenever you want to describe a set abstractly — by what its elements do, not by listing them — and it is the default notation for infinite sets in mathematical writing.

Types of sets

Sets come in a few standard types, named for properties of their size or structure.

The empty set, \varnothing. This is the set with no elements at all — the basket containing nothing. Two notations are common: \varnothing (a Norwegian letter) or \{\,\} (empty curly braces). They mean the same thing. The empty set is unique: there is only one empty set, regardless of what universe of objects you are working in, because two sets are equal exactly when they contain the same elements, and "no elements" matches "no elements" trivially. So whenever you write \varnothing, you are referring to that one specific set.

Finite sets. A set is finite if you can count its elements and the count is some specific natural number. \{1, 2, 3\} has 3 elements, so it is finite. \{a, e, i, o, u\} has 5 elements, finite. The empty set is also finite — its count is 0. The number of elements in a finite set A is called the cardinality of A, written |A| or sometimes n(A). So |\{1, 2, 3\}| = 3.

Infinite sets. A set is infinite if it is not finite — that is, no matter what natural number you propose as a count, the set has more elements than that. The natural numbers \mathbb{N} are infinite, the integers \mathbb{Z} are infinite, the rationals \mathbb{Q} are infinite, and the real numbers \mathbb{R} are infinite. In fact, the going-deeper section will sketch how some infinite sets are bigger than others — but for now, "infinite" just means "you can't put a finite count on it."

Universal set, U. Often you are working within a fixed "universe" of possible objects — the integers, say, or the students in a class, or the points on a plane. The universal set U is just whatever container you are working inside. Every set you define in that discussion is a subset of U. The universal set is not a fixed thing — what counts as "the universe" depends on the conversation. In a discussion about real numbers, U = \mathbb{R}; in a discussion about students in a school, U is the set of students.

Singleton. A set with exactly one element is a singleton. \{7\} is a singleton. \{\pi\} is a singleton. An important distinction: \{7\} is a set containing the number 7, not the number 7 itself. The notation \{7\} refers to the singleton; the notation 7 refers to the number. They are different objects, related by the relation 7 \in \{7\}.

Subsets and the power set

A set A is a subset of a set B — written A \subseteq B — if every element of A is also an element of B. So \{1, 2\} \subseteq \{1, 2, 3\} because both 1 and 2 are also in \{1, 2, 3\}. And \{1, 4\} \not\subseteq \{1, 2, 3\} because 4 is not in \{1, 2, 3\}.

Two important subset facts that look strange at first:

Every set is a subset of itself. A \subseteq A for any set A. The reason is that the definition is satisfied trivially: every element of A is, by definition, an element of A.
The empty set is a subset of every set. \varnothing \subseteq A for any set A. The definition is satisfied vacuously: there are no elements of \varnothing, so the statement "every element of \varnothing is also in A" has nothing to check, and is therefore true. (This is the only place in elementary set theory where vacuous truth shows up, but it is a useful logical move to get used to.)

If A \subseteq B but A \neq B — that is, every element of A is in B, but B has at least one extra element — then A is a proper subset of B, written A \subset B (or sometimes A \subsetneq B to emphasise the strict inequality). So \{1, 2\} \subset \{1, 2, 3\} is a proper subset (strict), while \{1, 2, 3\} \subseteq \{1, 2, 3\} is a subset but not a proper one.

Now for an idea that is not obvious until you see it. Given a set A, you can form a new set whose elements are all the subsets of A. This new set is called the power set of A, written \mathcal{P}(A) (or sometimes 2^A).

Take A = \{a, b\}. What are all its subsets? There are four: the empty set, the two singletons, and the set itself.

\mathcal{P}(A) = \big\{\,\varnothing,\, \{a\},\, \{b\},\, \{a, b\}\,\big\}

So \mathcal{P}(\{a, b\}) has 4 elements, even though \{a, b\} has only 2. The four "elements" of the power set are themselves sets — that is the strange-looking part. A power set is a set whose elements are sets. There is no contradiction: a set can contain anything, including other sets.

The pattern: if A has n elements, then \mathcal{P}(A) has 2^n elements. The reason is a clean counting argument — for each element of A, you have two independent choices when building a subset (include it, or don't include it). So the total number of subsets is 2 \times 2 \times \dots \times 2 = 2^n, with the 2's being multiplied together n times. The formula is one of the simplest applications of the laws of exponents from Exponents and Powers.

For A = \{1, 2, 3\} (n = 3), the power set has 2^3 = 8 elements:

\mathcal{P}(\{1, 2, 3\}) = \big\{\,\varnothing,\, \{1\},\, \{2\},\, \{3\},\, \{1, 2\},\, \{1, 3\},\, \{2, 3\},\, \{1, 2, 3\}\,\big\}

The power set grows very quickly with the size of A — by n = 10 it already has 1024 elements, and by n = 20 it has more than a million.

Intervals: subsets of the real line

Every interval on the real number line is, formally, a subset of \mathbb{R}. Interval notation is just a compact way to describe these subsets without writing them in full set-builder form every time.

There are four kinds of intervals, distinguished by whether each endpoint is included or excluded:

Notation	Set-builder form	Meaning
(a, b)	\{x \in \mathbb{R} \mid a < x < b\}	open — both endpoints excluded
[a, b]	\{x \in \mathbb{R} \mid a \leq x \leq b\}	closed — both endpoints included
[a, b)	\{x \in \mathbb{R} \mid a \leq x < b\}	half-open — left included, right excluded
(a, b]	\{x \in \mathbb{R} \mid a < x \leq b\}	half-open — left excluded, right included

The convention is that round brackets exclude the endpoint and square brackets include it. So [2, 5] is the set of all real numbers from 2 to 5 inclusive (that is, 2 and 5 themselves are in the set), while (2, 5) is the set of all real numbers strictly between 2 and 5 (so 2 and 5 are not in the set, but everything in between is). The difference is invisible at the level of "the interval has length 3" but very important at the level of "is the number 2 in this set or not?"

When an endpoint goes to infinity, you always use a round bracket on that side, because \infty is not a real number and cannot be "included" — it just indicates that the interval extends without bound. So (0, \infty) is the set of positive real numbers (open on the left because 0 is excluded, open on the right because \infty is not a real endpoint), and (-\infty, 5] is the set of all real numbers up to and including 5.

You can represent an interval by drawing it on the number line: a line segment with filled circles at included endpoints and open circles at excluded ones. Drag the two endpoints in the figure below. The shaded region between them is the open interval (a, b), and the readouts show the current endpoints and the length of the interval.

a: b:

Drag the two sliders to set the endpoints of an open interval. The shaded segment between them is the set $(a, b) = \{x \in \mathbb{R} \mid a < x < b\}$, the readouts show the current endpoint values, and the length of the interval is just $b - a$. The two red dots themselves represent the endpoints — and because this is the open interval, they are *not* in the set, even though they are visible as markers.

Two worked examples

Example 1: Find the power set of $A = \{1, 2, 3\}$ and verify that $|\mathcal{P}(A)| = 2^3 = 8$

The power set of A is the set of all subsets of A. To find it systematically, list every possible subset by size — starting with the empty set, then the singletons, then the pairs, then the whole set itself.

Step 1. The empty set is always a subset.

\varnothing

Step 2. The singletons (subsets with exactly one element).

\{1\}, \quad \{2\}, \quad \{3\}

There are three of them, one for each element of A.

Step 3. The pairs (subsets with exactly two elements).

\{1, 2\}, \quad \{1, 3\}, \quad \{2, 3\}

There are three of these too — one for each choice of two elements from A.

Step 4. The whole set itself, which is always a subset of itself.

\{1, 2, 3\}

Step 5. Collect all the subsets into the power set.

\mathcal{P}(A) = \big\{\,\varnothing,\, \{1\},\, \{2\},\, \{3\},\, \{1, 2\},\, \{1, 3\},\, \{2, 3\},\, \{1, 2, 3\}\,\big\}

Step 6. Count.

The count is 1 + 3 + 3 + 1 = 8, which matches the formula |\mathcal{P}(A)| = 2^{|A|} = 2^3 = 8.

Why 2^n: each element of A has exactly two roles when you build a subset — either it is in the subset, or it isn't. The choices are independent across the n elements, so the total number of distinct subsets is 2 \times 2 \times \dots \times 2 = 2^n, with the 2's being multiplied n times. For A = \{1, 2, 3\} that gives 2 \times 2 \times 2 = 8, exactly the count above.

Result. |\mathcal{P}(A)| = 8.

All eight subsets of $\{1, 2, 3\}$, organised by their size. The counts at each size — $1$, $3$, $3$, $1$ — are exactly the entries of row $3$ of Pascal's triangle, which is no coincidence. The total count is $8 = 2^3$, exactly what the power-set formula predicts.

Example 2: Express the set $\{x \in \mathbb{R} \mid |x - 2| < 3\}$ as an interval and draw it on the number line

This set is given in set-builder form, with a condition involving the modulus. The plan is to unwrap the modulus into a double inequality, solve it, and then translate the answer into interval notation.

Step 1. Unwrap |x - 2| < 3 into a double inequality.

The modulus inequality |y| < c (for c > 0) is equivalent to -c < y < c. So |x - 2| < 3 becomes:

-3 < x - 2 < 3

Why: |x - 2| measures the distance of x from 2 on the number line. Saying that distance is less than 3 means x lies somewhere in the interval whose centre is 2 and whose half-width is 3 — that is, somewhere strictly between 2 - 3 = -1 and 2 + 3 = 5. The double inequality is the algebraic version of "within 3 units of 2."

Step 2. Add 2 to all three parts of the inequality.

-3 + 2 < x - 2 + 2 < 3 + 2

-1 < x < 5

Why: this is the do-the-same-thing-to-both-sides rule from Operations and Properties, applied to a three-part inequality. You add 2 everywhere; the inequalities are preserved because 2 is positive and (more importantly) because adding doesn't flip inequalities.

Step 3. Translate into interval notation.

The set of all real x such that -1 < x < 5 is the open interval (-1, 5).

Result. \{x \in \mathbb{R} \mid |x - 2| < 3\} = (-1, 5).

The interval $(-1, 5)$ on the number line. The open circles at $-1$ and $5$ mean those points are *not* in the set; the thick red segment between them is the set itself. The centre of the interval is at $x = 2$, and the half-width is exactly $3$ — which is the geometric meaning of the original condition $|x - 2| < 3$, "$x$ is within $3$ units of $2$."

Common confusions

"\varnothing and \{\varnothing\} are the same." They are not. \varnothing is the empty set — a set with no elements. \{\varnothing\} is a singleton set whose only element is the empty set — so it is a set with exactly one element, and that one element happens to be a set itself. The first has cardinality 0; the second has cardinality 1. They are completely different objects.
"\in and \subseteq mean the same thing." No. \in is the membership relation: "x is one of the elements of A." \subseteq is the subset relation: "A is one of the subsets of B." A typical mistake: writing \{1\} \in \{1, 2, 3\}, which is false (the element \{1\} is not in \{1, 2, 3\} — only the number 1 is). The correct statement is \{1\} \subseteq \{1, 2, 3\} or 1 \in \{1, 2, 3\}.
"(2, 5) is an ordered pair." It is, in geometry and coordinate geometry. But in interval notation, (2, 5) is the open interval — the set of all real numbers strictly between 2 and 5. The same notation means two completely different things in different contexts, and you have to read from the surrounding text which one is meant. When in doubt, check whether the symbol is being treated as a point (ordered pair) or a set (interval).
"Every set must have at least one element." No — the empty set \varnothing has zero elements and is a perfectly good set. In fact, it is the unique set with that property, and it is a subset of every other set.
"A set can have \sqrt{2} as an element only if it is a set of irrationals." No — a set can contain anything, and the kind of object an element is doesn't have to match the other elements. The set \{1, \sqrt{2}, \pi, -3\} is a perfectly valid four-element set containing one natural number, one irrational, one transcendental, and one negative integer all at once. Sets are heterogeneous by default.
"The cardinality of \mathbb{N} is the same as the cardinality of \mathbb{R}, because both are infinite." No — they are both infinite, but they are different infinities. \mathbb{R} is in a precise sense a bigger infinity than \mathbb{N}, even though both are infinite. This is one of the most surprising results in mathematics, and it is sketched in the going-deeper section below.

Going deeper

If you came here for the basics of set notation and how to use it, you have everything you need. The rest of this section is for readers who want to see how the simple-looking ideas above lead to some of the deepest results in mathematics: the existence of different sizes of infinity, and a famous puzzle about a set that contains itself.

Different sizes of infinity

In the nineteenth century, the German mathematician Georg Cantor asked an apparently silly question: are some infinite sets bigger than other infinite sets? His answer turned out to be yes, and the proof is one of the most elegant arguments in all of mathematics.

The key idea is to compare infinite sets by trying to pair them up. Two sets have the same size if you can match every element of the first set to a unique element of the second, and vice versa. For finite sets this just recovers the usual count. For infinite sets it does something stranger: \mathbb{N} and \mathbb{Z} turn out to have the same size, even though \mathbb{Z} "looks twice as big," because you can pair them up: 1 \leftrightarrow 0, 2 \leftrightarrow 1, 3 \leftrightarrow -1, 4 \leftrightarrow 2, 5 \leftrightarrow -2, and so on. Every integer eventually gets matched to a natural number, and vice versa, so they have the same cardinality. Similarly, \mathbb{N} and \mathbb{Q} have the same size — the rationals can be enumerated in a clever zig-zag pattern that hits every fraction exactly once.

But \mathbb{N} and \mathbb{R} do not have the same size. Cantor proved that no matter what pairing you propose between \mathbb{N} and \mathbb{R}, there is always at least one real number left out. The proof is the famous diagonal argument: assume you have a complete list of real numbers, and then construct a new real number whose n-th digit differs from the n-th digit of the n-th number on your list. The new number cannot be on the list (it differs from every entry in at least one place), so the list was incomplete after all. Therefore \mathbb{R} is "bigger" than \mathbb{N} in a precise sense — there is no way to match them up.

Even more surprising: the power set of any infinite set is strictly bigger than the set itself. So \mathcal{P}(\mathbb{N}) is bigger than \mathbb{N}, and \mathcal{P}(\mathcal{P}(\mathbb{N})) is bigger still, and so on. There is no largest infinity — the sequence of infinities goes on forever, each one strictly larger than the last.

This is the kind of result that should not be possible in a chapter that started with a basket of fruit, but it is — and the path from the basket to here is just careful application of the few simple rules in this article.

Russell's paradox

Here is a puzzle to think about. Let R be the set of all sets that are not members of themselves — that is, R = \{X \mid X \notin X\}. Most everyday sets satisfy this: the set \{1, 2, 3\} is not a member of itself, because its members are 1, 2, and 3, and the set itself is not one of them.

Now ask: is R a member of R?

If R \in R, then by the definition of R, R must satisfy "is not a member of itself" — which means R \notin R. Contradiction.

If R \notin R, then R satisfies the membership condition for R, which means R \in R. Contradiction again.

So both answers lead to contradictions, which means the very idea of "the set of all sets not containing themselves" is broken. This is Russell's paradox, discovered by Bertrand Russell in 1901, and it nearly destroyed the foundations of mathematics at the time. The fix was to be more careful about which collections of objects are allowed to be sets — to disallow self-referential definitions of the kind that produced R. Modern mathematics uses a system of axioms (Zermelo-Fraenkel set theory) that rules out the paradox by being precise about how new sets can be built from old ones.

The puzzle is worth knowing about because it shows that "a set is just any collection of objects" is not quite a safe definition. Sets are subtler than they look, and a chapter on sets in the early years of school can only be the comfortable, paradox-free version of the story.

Dive deeper

Visualisations

Conceptual doubts

Thinking process & problem recognition

Where this leads next

Sets are the universal language of higher mathematics — every later chapter of the wiki uses them constantly.

Set Operations — the next chapter, covering union, intersection, difference, complement, Venn diagrams, and De Morgan's laws.
Number Systems — the symbols \mathbb{N}, \mathbb{Z}, \mathbb{Q}, \mathbb{R} that you saw informally there are formally sets, and now you have the language to talk about them precisely.
Operations and Properties — the algebraic structures (closure, identity, inverse) are defined as properties of sets together with operations on them.
Real Numbers and Their Properties — where the field axioms are stated as properties of \mathbb{R} as a set with + and \times.
Relations — a relation on a set is itself a set (a set of ordered pairs), and functions are relations of a particular kind.