Off-by-one on range boundaries
Wrong move: Loop endpoints miss first/last candidate.
Usually fails on: Fails on minimal arrays and exact-boundary answers.
Fix: Re-derive loops from inclusive/exclusive ranges before coding.
Break down a hard problem into reliable checkpoints, edge-case handling, and complexity trade-offs.
Table: ProductPurchases
+-------------+------+ | Column Name | Type | +-------------+------+ | user_id | int | | product_id | int | | quantity | int | +-------------+------+ (user_id, product_id) is the unique identifier for this table. Each row represents a purchase of a product by a user in a specific quantity.
Table: ProductInfo
+-------------+---------+ | Column Name | Type | +-------------+---------+ | product_id | int | | category | varchar | | price | decimal | +-------------+---------+ product_id is the unique identifier for this table. Each row assigns a category and price to a product.
Amazon wants to understand shopping patterns across product categories. Write a solution to:
category1 < category2)A category pair is considered reportable if at least 3 different customers have purchased products from both categories.
Return the result table of reportable category pairs ordered by customer_count in descending order, and in case of a tie, by category1 in ascending order lexicographically, and then by category2 in ascending order.
The result format is in the following example.
Example:
Input:
ProductPurchases table:
+---------+------------+----------+ | user_id | product_id | quantity | +---------+------------+----------+ | 1 | 101 | 2 | | 1 | 102 | 1 | | 1 | 201 | 3 | | 1 | 301 | 1 | | 2 | 101 | 1 | | 2 | 102 | 2 | | 2 | 103 | 1 | | 2 | 201 | 5 | | 3 | 101 | 2 | | 3 | 103 | 1 | | 3 | 301 | 4 | | 3 | 401 | 2 | | 4 | 101 | 1 | | 4 | 201 | 3 | | 4 | 301 | 1 | | 4 | 401 | 2 | | 5 | 102 | 2 | | 5 | 103 | 1 | | 5 | 201 | 2 | | 5 | 202 | 3 | +---------+------------+----------+
ProductInfo table:
+------------+-------------+-------+ | product_id | category | price | +------------+-------------+-------+ | 101 | Electronics | 100 | | 102 | Books | 20 | | 103 | Books | 35 | | 201 | Clothing | 45 | | 202 | Clothing | 60 | | 301 | Sports | 75 | | 401 | Kitchen | 50 | +------------+-------------+-------+
Output:
+-------------+-------------+----------------+ | category1 | category2 | customer_count | +-------------+-------------+----------------+ | Books | Clothing | 3 | | Books | Electronics | 3 | | Clothing | Electronics | 3 | | Electronics | Sports | 3 | +-------------+-------------+----------------+
Explanation:
The result is ordered by customer_count in descending order. Since all pairs have the same customer_count of 3, they are ordered by category1 (then category2) in ascending order.
Problem summary: Table: ProductPurchases +-------------+------+ | Column Name | Type | +-------------+------+ | user_id | int | | product_id | int | | quantity | int | +-------------+------+ (user_id, product_id) is the unique identifier for this table. Each row represents a purchase of a product by a user in a specific quantity. Table: ProductInfo +-------------+---------+ | Column Name | Type | +-------------+---------+ | product_id | int | | category | varchar | | price | decimal | +-------------+---------+ product_id is the unique identifier for this table. Each row assigns a category and price to a product. Amazon wants to understand shopping patterns across product categories. Write a solution to: Find all category pairs (where category1 < category2) For each category pair, determine the number of unique customers who purchased products from both categories A category pair is considered reportable
Start with the most direct exhaustive search. That gives a correctness anchor before optimizing.
Pattern signal: General problem-solving
{"headers":{"ProductPurchases":["user_id","product_id","quantity"],"ProductInfo":["product_id","category","price"]},"rows":{"ProductPurchases":[[1,101,2],[1,102,1],[1,201,3],[1,301,1],[2,101,1],[2,102,2],[2,103,1],[2,201,5],[3,101,2],[3,103,1],[3,301,4],[3,401,2],[4,101,1],[4,201,3],[4,301,1],[4,401,2],[5,102,2],[5,103,1],[5,201,2],[5,202,3]],"ProductInfo":[[101,"Electronics",100],[102,"Books",20],[103,"Books",35],[201,"Clothing",45],[202,"Clothing",60],[301,"Sports",75],[401,"Kitchen",50]]}}Source-backed implementations are provided below for direct study and interview prep.
// Accepted solution for LeetCode #3554: Find Category Recommendation Pairs
// Auto-generated Java example from py.
class Solution {
public void exampleSolution() {
}
}
// Reference (py):
// # Accepted solution for LeetCode #3554: Find Category Recommendation Pairs
// import pandas as pd
//
//
// def find_category_recommendation_pairs(
// product_purchases: pd.DataFrame, product_info: pd.DataFrame
// ) -> pd.DataFrame:
// df = product_purchases[["user_id", "product_id"]].merge(
// product_info[["product_id", "category"]], on="product_id", how="inner"
// )
// user_category = df.drop_duplicates(subset=["user_id", "category"])
// pair_per_user = (
// user_category.merge(user_category, on="user_id")
// .query("category_x < category_y")
// .rename(columns={"category_x": "category1", "category_y": "category2"})
// )
// pair_counts = (
// pair_per_user.groupby(["category1", "category2"])["user_id"]
// .nunique()
// .reset_index(name="customer_count")
// )
// result = (
// pair_counts.query("customer_count >= 3")
// .sort_values(
// ["customer_count", "category1", "category2"], ascending=[False, True, True]
// )
// .reset_index(drop=True)
// )
// return result
// Accepted solution for LeetCode #3554: Find Category Recommendation Pairs
// Auto-generated Go example from py.
func exampleSolution() {
}
// Reference (py):
// # Accepted solution for LeetCode #3554: Find Category Recommendation Pairs
// import pandas as pd
//
//
// def find_category_recommendation_pairs(
// product_purchases: pd.DataFrame, product_info: pd.DataFrame
// ) -> pd.DataFrame:
// df = product_purchases[["user_id", "product_id"]].merge(
// product_info[["product_id", "category"]], on="product_id", how="inner"
// )
// user_category = df.drop_duplicates(subset=["user_id", "category"])
// pair_per_user = (
// user_category.merge(user_category, on="user_id")
// .query("category_x < category_y")
// .rename(columns={"category_x": "category1", "category_y": "category2"})
// )
// pair_counts = (
// pair_per_user.groupby(["category1", "category2"])["user_id"]
// .nunique()
// .reset_index(name="customer_count")
// )
// result = (
// pair_counts.query("customer_count >= 3")
// .sort_values(
// ["customer_count", "category1", "category2"], ascending=[False, True, True]
// )
// .reset_index(drop=True)
// )
// return result
# Accepted solution for LeetCode #3554: Find Category Recommendation Pairs
import pandas as pd
def find_category_recommendation_pairs(
product_purchases: pd.DataFrame, product_info: pd.DataFrame
) -> pd.DataFrame:
df = product_purchases[["user_id", "product_id"]].merge(
product_info[["product_id", "category"]], on="product_id", how="inner"
)
user_category = df.drop_duplicates(subset=["user_id", "category"])
pair_per_user = (
user_category.merge(user_category, on="user_id")
.query("category_x < category_y")
.rename(columns={"category_x": "category1", "category_y": "category2"})
)
pair_counts = (
pair_per_user.groupby(["category1", "category2"])["user_id"]
.nunique()
.reset_index(name="customer_count")
)
result = (
pair_counts.query("customer_count >= 3")
.sort_values(
["customer_count", "category1", "category2"], ascending=[False, True, True]
)
.reset_index(drop=True)
)
return result
// Accepted solution for LeetCode #3554: Find Category Recommendation Pairs
// Rust example auto-generated from py reference.
// Replace the signature and local types with the exact LeetCode harness for this problem.
impl Solution {
pub fn rust_example() {
// Port the logic from the reference block below.
}
}
// Reference (py):
// # Accepted solution for LeetCode #3554: Find Category Recommendation Pairs
// import pandas as pd
//
//
// def find_category_recommendation_pairs(
// product_purchases: pd.DataFrame, product_info: pd.DataFrame
// ) -> pd.DataFrame:
// df = product_purchases[["user_id", "product_id"]].merge(
// product_info[["product_id", "category"]], on="product_id", how="inner"
// )
// user_category = df.drop_duplicates(subset=["user_id", "category"])
// pair_per_user = (
// user_category.merge(user_category, on="user_id")
// .query("category_x < category_y")
// .rename(columns={"category_x": "category1", "category_y": "category2"})
// )
// pair_counts = (
// pair_per_user.groupby(["category1", "category2"])["user_id"]
// .nunique()
// .reset_index(name="customer_count")
// )
// result = (
// pair_counts.query("customer_count >= 3")
// .sort_values(
// ["customer_count", "category1", "category2"], ascending=[False, True, True]
// )
// .reset_index(drop=True)
// )
// return result
// Accepted solution for LeetCode #3554: Find Category Recommendation Pairs
// Auto-generated TypeScript example from py.
function exampleSolution(): void {
}
// Reference (py):
// # Accepted solution for LeetCode #3554: Find Category Recommendation Pairs
// import pandas as pd
//
//
// def find_category_recommendation_pairs(
// product_purchases: pd.DataFrame, product_info: pd.DataFrame
// ) -> pd.DataFrame:
// df = product_purchases[["user_id", "product_id"]].merge(
// product_info[["product_id", "category"]], on="product_id", how="inner"
// )
// user_category = df.drop_duplicates(subset=["user_id", "category"])
// pair_per_user = (
// user_category.merge(user_category, on="user_id")
// .query("category_x < category_y")
// .rename(columns={"category_x": "category1", "category_y": "category2"})
// )
// pair_counts = (
// pair_per_user.groupby(["category1", "category2"])["user_id"]
// .nunique()
// .reset_index(name="customer_count")
// )
// result = (
// pair_counts.query("customer_count >= 3")
// .sort_values(
// ["customer_count", "category1", "category2"], ascending=[False, True, True]
// )
// .reset_index(drop=True)
// )
// return result
Use this to step through a reusable interview workflow for this problem.
Two nested loops check every pair or subarray. The outer loop fixes a starting point, the inner loop extends or searches. For n elements this gives up to n²/2 operations. No extra space, but the quadratic time is prohibitive for large inputs.
Most array problems have an O(n²) brute force (nested loops) and an O(n) optimal (single pass with clever state tracking). The key is identifying what information to maintain as you scan: a running max, a prefix sum, a hash map of seen values, or two pointers.
Review these before coding to avoid predictable interview regressions.
Wrong move: Loop endpoints miss first/last candidate.
Usually fails on: Fails on minimal arrays and exact-boundary answers.
Fix: Re-derive loops from inclusive/exclusive ranges before coding.