Softmatches & Recommendations
Overview
This deep dive was inspired by my work at Meta. However, a number of details have been removed due to confidentiality.
Recommendations are a critical part of modern social network applications, playing a crucial role in user engagement, growth, and overall platform value. In the Meta ecosystem, recommendation algorithms such as “People You May Know” or “Accounts You May Follow,” serve as powerful discovery tools that help users expand their networks.
Meta’s unique advantage lies in its ability to leverage interconnected social graphs across Facebook, Instagram, and WhatsApp, creating a comprehensive user profile that spans multiple platforms. This cross-platform data integration allows Meta to generate more accurate and personalized recommendations, enhancing user engagement across its ecosystem. For example, a user’s Facebook friend list could inform Instagram’s “Suggested Follows,” while their WhatsApp contact list might enhance Facebook’s “People You May Know” feature.
What is softmatching and how does it work?
Softmatching is a probabilistic approach to identity matching that allows for fuzzy comparisons between user profiles or identities. Unlike hardmatching, which requires exact matches on specific identifiers, softmatching uses a combination of features to determine the likelihood that two profiles represent the same entity. Entities can be businesses or individuals but for this article, let’s assume we are primarily discussing individuals.
One can create a pretty rudimentary model by identifying relevant attributes such as name, email, phone number, location, age, and behavioral patterns. You can then create derived features such as Levenshtein distance for string comparisons or phonetic encodings.
Once you have gotten this data, you can hand label the dataset to identify matches. You can also use attributes such as email and phone number which are unique under reasonable conditions to decide whether they are a match. Note, that phone numbers can be recycled and be assigned to different people.
Profile A | Profile B | Match |
---|---|---|
John, 30, NYC | Jon, 31, New York | 1 |
Alice, 25, LA | Alice, 45, Chicago | 0 |
How can softmatches be used in friend recommendations?
Softmatches can play a crucial role in enhancing friend recommendations in social networks by expanding the pool of potential connections beyond exact matches. In the context of friend recommendations, softmatching algorithms can identify similarities between users based on a combination of factors such as partial name matches, similar educational backgrounds, overlapping professional histories, or shared interests. For instance, if a user named “John Smith” is a recommendation on Meta, it is possible that the account matches to “Jon Smyth” on Instagram who attended the same university or worked in the same industry. To another entity on both the platform, if one of the above accounts are a valid recommendation, it is possible that the other one is also a valid recommendation as well.
Why can softmatching be an issue?
While softmatching can be used to generate high quality recommendations, there can be subtle issues around privacy. E.g. if the user has not explicitly said that two accounts across different networks are the same, is it ok to assume that they are based on a machine learning model? There can also be questions around competition where only a few applications around the world can have the data to do softmatching making them irreplaceable.
Softmatching vs Hardmatching
Softmatching occurs when we infer two accounts are related using a model. Hardmatching occurs when the user connects two accounts, usually to use a feature.
Requirements
- Migrate from softmatch to hardmatch features without any downtime
- Build a data model which can be extended in the future to other account models such as WhatsApp and Occulus
- Ensure that there is no future use of softmatching across all the codebases.
- Regulate the use of hardmatches in the future.
Data Model changes
Challenges with the old data model:
- It was hard to traverse the graph DB from a FB account to its connected IG account.
- It was impossible to extend the match concept to other family of apps.
The new data model shown below solved all of those problems by creating a new entity type which linked to the core account model. It also connected to the IG secondary account models to make traversal easy.
To migrate to this new data model, we had to implement a number of systems.
1. Comprehensive Lineage Tracking
Tracking softmatching features across diverse tech stacks and data warehouses present a significant challenge due to the complexity and scale of the codebases.
We need to implement lineage tracking in both application code base and data warehouses. In application code bases you can use tools such as Pysa (Python) and Mariana Trench (Java) for code analysis. In data warehouses, you will have to analyze the DAGs generating the artifacts to track flow of features.
2. Updating Call Sites
Identifying and updating tens of thousands of call sites in both online and offline contexts required a systematic approach to avoid disruptions and ensure complete coverage.
We built several code refactoring libraries to update these instead of doing these manually. One strategy is to implement syntax aware transformations to update function calls and data access patterns. Also, we should roll this out in a staged manner starting with non-critical use to critical use. We used feature flags to enable gradual rollout and easy rollback.
Friend recommendation algorithms
Objective
The primary goal is often to increase user engagement on the platform by recommending content on the feed based on common interests in the social graph. Of course, different apps take a different perspective on the level of importance to provide to the social graph. E.g. TikTok apparently decides the feed primarily based on the quality of the content rather than the social graph.
Signals
Recommendation algorithms are designed to suggest new friends based on various data points and signals to predict potential social connections. Some examples are:
Mutual Friends: The primary factor is the number of mutual friends. If you and another person have many friends in common, one can assume there is a higher probability you may know each other.
Profile Information: The algorithm looks at profile details such as current city, hometown, work, education, and interests. If these details match or overlap with other users, it might suggest them as potential connections.
Phone Contacts and Email: If you have shared your phone contacts or email addresses with an app (e.g., through the app’s contact sync feature), the algorithm can match these details with other users who have also shared their contacts. If there’s a match, it may suggest that person to you.
Interaction Patterns: You can also use how someone interacts with existing friends and other users (e.g., liking posts, commenting, tagging, or engaging in the same groups or events). Users who engage with the same content might be suggested to each other.
Group Memberships: Being a member of the same groups or attending similar events can prompt a recommendation. This is especially the case for large, active groups where Facebook assumes a greater likelihood of shared interests or acquaintances.
Network and Location Data: Apps can collect data on network proximity (e.g., IP addresses, Wi-Fi networks) and real-world location if location services are enabled. If two users are often at the same places or use similar networks, they might be suggested to each other.
Third-Party App and Website Interactions: If you use Facebook to log into other apps or websites, or if you engage with pixels on other sites, this data can be used to refine recommendations. For example, if multiple users frequently visit or engage with the same sites, Facebook might assume they share common interests.
Explicit User Actions: If you explicitly search for someone, visit their profile, or frequently engage with certain types of content or profiles, apps may take this as a signal to suggest that person as a friend.
Friends of friends: Another common pattern is for the algorithm to suggest “friends of friends,” especially those who are geographically close or share similar demographics or interests.
Filtering
Of course, these signals can generate a wide number of candidates. A common model to help filter the candidates into the UI budget is a wide and deep learning model.
Source: https://arxiv.org/pdf/1606.07792
This model can be used to rank the candidates based on the defined objective function.
Systems
Note that in a production recommendation system, it is rarely ideal to have heavy amount of computation during run time. In most apps, the run time latency budget is of the order of 10s-100s of milliseconds. A better architecture is to generate recommendations at a certain time period and cache the results. E.g. if the UI budget is for 5 recommendations, it’s probably good to generate 100 ranked recommendations. Of course, there are issues around cache size, how to prioritize users to generate for, what if the user uses up all their recommendations. However, that can be a separate post.
Removing softmatched based signals
Now that we have some intuition around softmatching and friend recommendation systems, let’s dive into the challenges in removing the softmatched based signals from recommendation system. One can imagine this would cause a big gap.
The sets of feature groups which were impacted by deprecating softmatch signals are:
- Mutual Friends
- Friends of friends
- Phone Contacts and Email
The softmatched features were fed in as traditional features and also into creating graph embeddings (softmatches would decide on when to traverse a node) which were used in the models.
Quick hacks
- Retrain the model without softmatches: This naturally gives poorer results but helps you create a baseline.
- Ask the user to provide hard matches. This is a product challenge which can be covered separately.
Engineering challenges
It is worthwhile delving into a number of the engineering challenges which can come up while removing a collection of features like softmatching.
Engineering objectives
- Maintain or improve the accuracy and performance of recommendation systems.
- Minimize disruption to user experience and key business metrics during the transition.
Challenges
1. Maintaining Model Quality and Performance
Ensuring model quality and performance remain consistent despite the fundamental change in feature distribution is crucial for preserving user experience. One interesting thing we tried was try to decompose the softmatch features when there was a hard match and tried to train the model with that. This model training angle did not work well but was an interesting experiment.
2. Evaluating embedding changes
While it was trivial to observe the difference in distribution between softmatch and hard matches, it was much harder to understand the change in embeddings with softmatch and hard match apart from results on the downstream tasks.
3. Removing hardmatches
While softmatches never needed to removed immediately, hard matches did need to be removed from factoring into the model. E.g. if a user disconnects their connection between FB and IG, we don’t want the model to recommend matches across the accounts. We experimented with using the main model by nulling the hard match features or use a model which was trained without hard match features. Since there was fear of leakage, we decided to go with the latter.
4. Shadow testing
We did a lot of shadow testing of the new model and compared the MAP and recall using the existing stable model. Note, shadow testing doesn’t give you an estimate of business metrics. We got that from A/B testing. However, the goal was to understand how different the recommendations were. While this was interesting, it was hard to come up with a threshold which was acceptable since we observed that there were valid reasons for the new recommendations to be good. A/B testing with business metrics as the primary safeguard gave us the level of confidence required to move ahead.
5. Testing on different demographics
We needed to test whether the new model was biased against certain demographic. This was key to get signoff. This was very hard to do since there weren’t good datasets available for this.
Summary
In this deep dive into recommendation systems, inspired by work at Meta, we explored the intricate world of friend recommendations and the transition from softmatching to hardmatching. Key points included:
- The importance of recommendations in social networks for user engagement and growth.
- The concept of softmatching and its role in generating high-quality recommendations.
- The challenges and privacy concerns associated with softmatching.
- The process of migrating from softmatch to hardmatch features, including data model changes and implementation challenges.
- An overview of friend recommendation algorithms, including signals used and filtering techniques.
- The engineering challenges faced when removing softmatched-based signals from recommendation systems.
This transition from softmatching to hardmatching represents a significant shift in how social networks approach user connections and recommendations. While it presents numerous technical and product challenges, it also offers opportunities for improved user privacy and more transparent recommendation systems.
As social networks continue to evolve, the balance between powerful recommendation algorithms and user privacy will remain a critical consideration. The lessons learned from this transition can inform future developments in recommendation systems across various platforms and industries. Moving forward, it will be crucial for social networks and other platforms using recommendation systems to:
- Continuously refine their algorithms to provide value while respecting user privacy.
- Be transparent about the data used in making recommendations.
- Give users more control over their data and how it’s used in recommendation systems.
- Invest in robust testing methodologies to ensure fairness and avoid biases in recommendations.