Case Study

Behavioral Ranking for Automotive Parts Search (PartsTech)

, , ,

I led product work at PartsTech to introduce behavioral ranking into search using add-to-cart and purchase signals tied to vehicle and query context. The goal was to improve result ordering so search reflected what customers actually choose in real repair workflows, not just static relevance rules. We launched the ranking change behind an A/B test with tight guardrails because this is a fitment-sensitive marketplace. The test is still live through March 2026, and the current read shows a 1 to 2 percent lift in conversion across overall categories.

Role and scope

At PartsTech, I lead product for Search and Recommendations in a B2B automotive marketplace.

This work focused on search ranking quality. We already had relevance logic and business rules in place, but we needed the ranking system to learn more from customer behavior and improve how products were ordered for a given vehicle and query.

Because this is automotive parts search, the work had to balance performance and trust. We needed to improve conversion without creating fitment risk, unstable ranking behavior, or business-side issues around supplier visibility.

The problem

Our ranking logic had good controls, but it was still relying heavily on static signals and rules.

That gave us consistency, but it also limited how well search could adapt to real customer behavior. In this marketplace, the best result is not just the best text match. It is often the part customers repeatedly choose for a specific vehicle and repair context, with all the practical tradeoffs that come with price, availability, and preference.

We had enough behavioral signal to do better, especially from add-to-cart and purchase behavior. The challenge was turning that into ranking improvements in a way that was safe, measurable, and easy to tune.

How we did it

1) Start with high-confidence behavioral signals

I worked with engineering and data science to define the first version of behavioral ranking around the strongest signals we had:

  • add-to-cart behavior
  • purchase behavior
  • vehicle context
  • query context

We started there on purpose. These signals were directly tied to customer decisions and business value, and they gave us a practical way to improve ranking without trying to solve everything at once.

2) Build a ranking approach that could handle sparse data

Behavioral ranking works well when signal density is strong, but automotive search also has long-tail queries and sparse contexts.

I focused the team on a ranking strategy that could use behavioral signals where they were strong and fall back safely where they were thin. That meant designing the release with:

  • coverage checks by category and query type
  • stable fallback behavior for low-signal contexts
  • compatibility with existing relevance and business controls
  • a rollout plan that let us evaluate quality before expanding further

This kept the ranking improvements practical and reduced risk during rollout.

3) Launch as an experiment with tight guardrails

We launched the change as a live A/B test so we could evaluate conversion impact and ranking behavior in production.

I treated the experiment as more than a go/no-go decision. The goal was to establish a repeatable tuning loop so we could keep improving the ranking system over time.

Because this is a fitment-sensitive marketplace, we monitored more than just top-line conversion. We also watched for quality drift, category-level behavior, and any signs that the ranking system was over-weighting popularity in ways that could hurt trust.

Evaluating success

I wanted the team to answer two questions clearly:

  1. Are we improving conversion?
  2. Are we doing it without hurting search quality or trust?

Primary metrics

  • Conversion rate from search
  • Add-to-cart rate from search sessions
  • Category-level conversion movement
  • Query-level engagement behavior

Guardrail metrics

  • Ranking stability across categories
  • Coverage in sparse-data contexts
  • Search quality checks in fitment-sensitive workflows
  • Operational reliability and rollout health

Experimentation approach

We used the A/B test as an active learning loop. The test remains live through March 2026, and we are continuing to monitor performance, review category behavior, and tune the ranking logic as needed.

Outcomes

The experiment is still running, and the current outcome is positive.

  • A/B test remains live through March 2026
  • Current read shows a 1 to 2 percent lift in conversion across overall categories
  • Established a behavioral ranking foundation that can support future learning-to-rank and personalization work

The lift is meaningful because it comes from a ranking change in a complex B2B marketplace where trust and fitment quality matter as much as conversion.

Leadership and org capability impact

A big part of this work was creating alignment across teams on how ranking decisions should be made.

Shared tradeoff framework for ranking decisions

I helped frame ranking as a product system that balances customer relevance, fitment trust, and business controls. That gave product, engineering, and data science a better way to make decisions together, especially when tradeoffs came up around signal quality, rollout scope, and category behavior.

A repeatable ranking improvement loop

This work also helped establish a stronger pattern for search ranking changes:

  • start with high-signal behavioral features
  • launch behind a controlled experiment
  • monitor both conversion and quality guardrails
  • tune iteratively instead of treating ranking as a one-time release

That made the program easier to scale and easier for partner teams to trust.

Foundation for broader search and marketplace strategy

Behavioral ranking also created a stronger base for adjacent work, including:

  • broader learning-to-rank expansion
  • personalization
  • supplier and monetization tradeoff discussions
  • more advanced search decision systems over time

What I’d build on next

The next steps I would prioritize are:

  • expanding feature coverage with additional behavioral and contextual signals
  • improving long-tail handling in sparse-data categories
  • layering in stronger personalization where signal quality supports it
  • building more automated eval and regression checks for ranking quality
  • tightening the connection between ranking experiments and marketplace monetization strategy