Before You Compare Two Sets — Strip Duplicates, Ignore Order, Write the Canonical Form

A JEE-style question says: "Are A = \{2, 1, 2, 3, 1\} and B = \{x \in \mathbb{N} \mid x \leq 3\} equal?" A student who tries to compare the expressions side by side gets stuck — they look completely different. One has five numbers written out with repeats, the other has a logical condition and no listed elements at all. It looks like a trick.

There is no trick. There is only a habit you haven't picked up yet: before comparing two sets, rewrite each of them in canonical form — duplicates stripped, elements sorted, roster form with no extras. Two sets are equal if and only if their canonical forms are identical. That is the whole comparison test, and it never fails.

This article gives you the habit, shows why it works, and walks through five examples where it instantly cleans up confusion.

What canonical form means

The canonical form of a finite set is its roster form written with:

Every element listed exactly once (no duplicates in the narration).
The elements in a consistent order — numerical order for numbers, alphabetical for letters, or any other fixed rule you pick and stick to.

For the messy expression \{2, 1, 2, 3, 1\}, the canonical form is \{1, 2, 3\} — duplicates gone, sorted ascending. For the vowels \{u, a, e, i, o\}, the canonical form is \{a, e, i, o, u\}. For B = \{x \in \mathbb{N} \mid x \leq 3\}, you first unpack the set-builder form into its elements — \{1, 2, 3\} — and that already happens to be canonical.

Now the comparison A = B collapses to \{1, 2, 3\} = \{1, 2, 3\}. Done.

Why canonical form is a lossless rewrite

Recall the bag picture: a set is defined by which objects are inside. Two properties follow.

Order doesn't matter. Writing the elements in a different order doesn't change the bag, because the bag doesn't record "first" or "second."
Duplicates don't matter. Writing an element twice in the narration doesn't put two copies in the bag — sets don't hold multiple copies.

So stripping duplicates and sorting is not losing information. It is just rewriting the same bag in a standardised narration. The underlying set — the thing the notation is describing — is untouched. This is why canonical form is the comparison form: every "different-looking expression" for the same set collapses to a single canonical string.

Why this works: by the Axiom of Extensionality, two sets are equal iff they have the same elements. Canonical form makes "the same elements" immediately visible by ordering and deduplicating the listed elements. If the canonical strings match, the sets match. If they differ on any element, the sets are unequal.

Three different-looking set expressions and their shared canonical form. Any question "are these two sets equal?" reduces to "do their canonical forms match?"

The three-step canonicalisation

Given any set expression, turn it into canonical form with three steps:

Step 1 — Expand. If the expression is in set-builder form, unpack it into explicit elements. \{x \in \mathbb{N} \mid x \leq 3\} expands to \{1, 2, 3\}. \{y \mid y^2 = 4\} expands to \{-2, 2\}.

Step 2 — Deduplicate. Walk the list and delete any element that already appeared earlier. \{2, 1, 2, 3, 1\} after deduplication is \{2, 1, 3\}.

Step 3 — Sort. Arrange the elements in a consistent order. For numbers, ascending is the universal convention. \{2, 1, 3\} sorts to \{1, 2, 3\}.

Three passes, linear time, no judgement calls. The output is the canonical form, and comparing two canonical forms is just string equality.

Worked examples

Example 1: simple equality

Are \{5, 3, 5, 1, 3\} and \{1, 3, 5\} equal?

Canonicalise the first. Expand: already roster form. Deduplicate: \{5, 3, 1\}. Sort: \{1, 3, 5\}. That matches the second expression exactly. Equal.

Example 2: set-builder on both sides

Are \{x \in \mathbb{Z} \mid -2 \leq x \leq 2\} and \{x \in \mathbb{Z} \mid x^2 \leq 4\} equal?

Expand the first: integers from -2 to 2 inclusive, which is \{-2, -1, 0, 1, 2\}. Sort: already sorted. Canonical form: \{-2, -1, 0, 1, 2\}.

Expand the second: integers x with x^2 \leq 4. That means -2 \leq x \leq 2. Same as above. Canonical form: \{-2, -1, 0, 1, 2\}.

The two canonical forms match. Equal.

Example 3: trap — identical-looking but different

Are \{1, 2, 3\} and \{\{1\}, \{2\}, \{3\}\} equal?

Canonicalise each. The first is already canonical. The second has three elements — the sets \{1\}, \{2\}, \{3\} — and is also already canonical. But its canonical form is \{\{1\}, \{2\}, \{3\}\}, not \{1, 2, 3\}. Not equal. The first is a set of three numbers; the second is a set of three singleton sets. Different levels, as covered in Elements and Subsets Live at Different Levels.

Example 4: repeat with a twist

Are \{\text{mango}, \text{Mango}, \text{mango}\} and \{\text{mango}\} equal?

It depends on whether "mango" and "Mango" are considered the same object. In mathematics, capitalisation usually matters for named objects (different identifiers), so these are two different labels: the first set has two elements, \text{mango} and \text{Mango}, while the second has one. After canonicalisation, the first is \{\text{Mango}, \text{mango}\} (sorted by capital first in the standard ASCII ordering) with two elements, and the second is \{\text{mango}\} with one. Not equal.

The takeaway: deduplication is by object identity, not by "looks similar." If the problem says two elements are the same (for instance, "1 and 2/2"), deduplicate. Otherwise, treat distinct-looking elements as distinct.

Example 5: a question that looks long but isn't

Are A = \{2k \mid k \in \mathbb{N}, \, k \leq 5\} and B = \{2, 4, 6, 8, 10\} equal?

Expand A. As k runs over 1, 2, 3, 4, 5, 2k runs over 2, 4, 6, 8, 10. So A = \{2, 4, 6, 8, 10\}, already canonical. B is already canonical. Equal.

Notice how short the answer is once you commit to canonicalising. Without the habit, you might stare at the expressions and wonder whether the set-builder form hides some subtlety. With the habit, you just expand, sort, compare. Ten seconds.

The habit, formalised

In every set-equality or set-comparison problem, do this sequence before attempting any algebra:

Write each set in canonical form on your rough sheet.
Compare the canonical forms character by character.
If they match, the sets are equal. If they differ on any element, the sets are unequal.

For infinite sets the rule still applies, but step 1 becomes "describe each set in a canonical set-builder form." For instance, \{x \in \mathbb{Z} \mid x is even\} and \{2n \mid n \in \mathbb{Z}\} both describe the even integers — two different set-builder forms that describe the same canonical object. The comparison happens at the level of "which elements does each describe," not "do the expressions look identical."

A self-test

Decide whether each pair of sets is equal. Write canonical forms on scratch and compare.

\{1, 2, 3, 3, 2, 1\} and \{1, 2, 3\}
\{a, b, c\} and \{c, b, a, a\}
\{n \in \mathbb{N} \mid n^2 \leq 9\} and \{1, 2, 3\}
\{x \in \mathbb{R} \mid x^2 = 1\} and \{1\}
\{\varnothing\} and \varnothing
\{\{a\}\} and \{a\}

Answers:

Both canonicalise to \{1, 2, 3\}. Equal.
Both canonicalise to \{a, b, c\}. Equal.
First expands to \{1, 2, 3\} (natural numbers with square \leq 9 — that is 1^2 = 1, 2^2 = 4, 3^2 = 9). Equal.
First expands to \{-1, 1\}. Second is \{1\}. Canonical forms differ. Not equal.
\{\varnothing\} is a singleton containing the empty set — canonical form \{\varnothing\}, one element. \varnothing has zero elements, canonical form \varnothing or \{\}. Different cardinality, different canonical form. Not equal.
\{\{a\}\} is a singleton containing the singleton \{a\}. \{a\} is a singleton containing a. First's element is a set; second's element is an object. Not equal.

Why the level traps keep catching people: the canonical-form habit works cleanly when the elements are listed at the same level. When you are comparing things at different levels (an object vs a set, the empty set vs a set containing the empty set), you first need to identify the levels — then canonicalise at each level — and only then compare.

Are the following three sets all equal?

A = \{x \in \mathbb{R} \mid (x-1)(x-2)(x-3) = 0\}, \quad B = \{1, 2, 2, 3, 1\}, \quad C = \{n \in \mathbb{N} \mid 1 \leq n \leq 3\}.

Canonicalise each.

A: The polynomial equation (x-1)(x-2)(x-3) = 0 has roots x = 1, 2, 3. Canonical form: \{1, 2, 3\}.

B: Already roster form with repeats. Deduplicate: \{1, 2, 3\}. Sort: \{1, 2, 3\}.

C: Natural numbers from 1 to 3: \{1, 2, 3\}. Already canonical.

All three canonical forms are \{1, 2, 3\}. Yes, all three sets are equal.

Why: three different-looking set expressions — one polynomial equation, one roster with repeats, one set-builder — all describe the same underlying bag. Canonicalisation makes the agreement immediate; without it, the comparison would feel like comparing three unrelated expressions.

The payoff

Canonicalisation is the simplest possible equality check, and it works on every finite set without exception. Making it a reflex saves minutes in exams — and, more importantly, saves you from the "these expressions look different, so the sets must be different" fallacy that is exactly what many set questions exploit.

In higher mathematics, the same idea scales up. Two groups are "the same" iff their multiplication tables match (up to renaming). Two graphs are "the same" iff their adjacency structures match (up to renaming). The general pattern — compare canonical forms, not surface expressions — starts here, with sets.