My very first project at my current company was to migrate a system written in PHP to Java. There is one function that plays an important role in the system. It performs the following tasks:
Given a sorted list A and a set B, both containing Strings. A and B are loaded from Redis and cached in memory. List A remains constant between requests, while set B changes.
- Find the intersection of A and B
- Create list C containing the intersecting items, ordered as they appear in A
- Append all remaining items from B to C
- Return the reverse of C
private static List<String> sortItems(List<String> sortedItemList, Set<String> items) {
if (items == null || items.isEmpty()) {
return Collections.emptyList();
}
if (sortedItemList == null || sortedItemList.isEmpty()) {
return new ArrayList<>(items);
}
// Find intersection, preserving order from sortedItemList
List<String> intersection = new ArrayList<>();
for (String item : sortedItemList) {
if (items.contains(item)) {
intersection.add(item);
}
}
List<String> reversedList = new ArrayList<>(intersection);
Set<String> sortedSet = new HashSet<>(intersection);
// Add remaining items
for (String s : items) {
if (!sortedSet.contains(s)) {
reversedList.add(s);
}
}
// Reverse the list to get the final desired order
Collections.reverse(reversedList);
return reversedList;
}
Performance Observations
Through profiling and observation, I noted that the sizes of A and B could vary significantly between requests. Specifically, two main scenarios emerged:
Case 1: A is very large (2kâ10k elements) while B is very small (50â100 elements).
Case 2: The sizes of A and B are more comparable (50â2k elements).
This led to two key observations about the original implementation:
Second: Calling Collections.reverse() seems inefficient. Since we already have a sorted list, we can build the reversed list using a more clever approach. The Collections.reverse() method performs swaps on the list elements, and swapping can be relatively slow.
But... why does this matter?
Developing an Optimized Solution
My goal was to create a solution that performed well across both Case 1 and Case 2 by addressing the observed inefficiencies. This led to developing two distinct implementations and dynamically choosing between them.
Solution Approach:
The core idea is to use different strategies based on the relative sizes of A and B.
- For Case 1 (small B, large A), iterating through B to find the intersection is more efficient than iterating through A.
- For Case 2 (comparable sizes), iterating through A (like the original O(n + m) approach) is generally fine, but we can optimize by avoiding the separate Collections.reverse() call.
private static List<String> sortItemFromSet_givenMap(Map<String, Integer> item2Order, Set<String> items) {
if (items == null || items.isEmpty()) {
// return empty list if no items are provided
return Collections.emptyList();
}
if (item2Order == null || item2Order.isEmpty()) {
// return items as is if no orders are provided
return new ArrayList<>(items);
}
List<ItemOrder> itemOrderList = new ArrayList<>();
Set<String> addedItem = new HashSet<>();
for (String deal : items) {
// find intersection
if (item2Order.containsKey(deal) && addedItem.add(deal)) {
itemOrderList.add(new ItemOrder(deal, item2Order.get(deal)));
}
}
if (itemOrderList.isEmpty()) {
return new ArrayList<>(items);
}
// sort by order
itemOrderList.sort(Comparator.comparingInt(ItemOrder::getOrder));
String[] result = new String[items.size()];
int currentIndex = items.size() - 1;
// fill the result array with items in order
for (ItemOrder item : itemOrderList) {
result[currentIndex--] = item.getItem();
}
// fill the result array with items do not in the intersection
for (String deal : items) {
if (!addedItem.contains(deal)) {
result[currentIndex--] = deal;
}
}
return Arrays.asList(result);
}
@Getter
@AllArgsConstructor
static class ItemOrder {
private String item;
private int order;
}
private static List<String> sortItemFromSet_givenList(List<String> sortedItems, Set<String> items) {
if (items == null || items.isEmpty()) {
// return empty list if no items are provided
return Collections.emptyList();
}
if (sortedItems == null || sortedItems.isEmpty()) {
return new ArrayList<>(items);
}
int size = items.size();
int currentIndex = size - 1;
String[] result = new String[size];
Set<String> addedItems = new HashSet<>(size);
// Iterate over sortedItems and items in one go
for (String item : sortedItems) {
if (items.contains(item) && addedItems.add(item)) {
result[currentIndex--] = item;
}
}
if (addedItems.isEmpty()) {
return new ArrayList<>(items);
}
// Append the remaining items from items not in result
for (String item : items) {
if (!addedItems.contains(item)) {
result[currentIndex--] = item;
}
}
return Arrays.asList(result);
}
@Getter
@AllArgsConstructor
public static class ItemStruct {
private List<String> sortedItemList;
private Map<String, Integer> item2OrderMap;
}
Dynamic Implementation Selection
private static List<String> sortItem(ItemStruct itemStruct, Set<String> items) {
if (items == null || items.isEmpty()) {
return Collections.emptyList();
}
if (itemStruct != null && !itemStruct.getSortedItemList().isEmpty()) {
int sizeOfItems = items.size();
int sizeOfItemSortedList = itemStruct.getSortedItemList().size();
if (sizeOfItems * IntMath.log2(sizeOfItems, RoundingMode.HALF_UP) < (sizeOfItems + sizeOfItemSortedList)) {
return sortItemFromSet_givenMap(dealWithImp.getDealImpWeightMap(), items);
}
return sortItemFromSet_givenList(dealWithImp.getDealImpList(), items);
}
return new ArrayList<>(items);
}
Testing and Results
- Size of A: 7000
- Size of B: 100
- Intersection: 100
- Calls: 10,000
- Old Implementation: 256 ms
- New Implementation: 98 ms (~2.6x faster)
- Size of A: 5000
- Size of B: 5000
- Intersection: 3000
- Calls: 10,000
- Old Implementation: 1255 ms
- New Implementation: 909 ms (~1.4x faster)
Comments
Post a Comment