Subscribe to Updates

    Get the latest creative news from CRYPTO NOUNCE.

    What's Hot

    US Justice Dept’s Google advertising case gets fast-paced schedule By Reuters

    March 24, 2023

    6 River Systems co-founder on the state of warehouse robots

    March 24, 2023

    Are XRP whales assembling for a win as Ripple president oozes confidence

    March 24, 2023
    Facebook Twitter Instagram
    Facebook Twitter Instagram Vimeo
    Cryptonounce.com
    Contact
    • Business
      • Deals
      • investors
      • IPO
      • Startups
      • Wall Street
    • Markets
      • Bonds
      • Commodities & Futures
      • Currencies
      • Funds & ETFs
      • Stocks
    • Crypto
      • Alticoins News
      • Binance News
      • Bitcoins News
      • Blockchain News
      • Ethereum News
      • Token Sales News
      • XRP News
    • Technology
      • Artificial Intelligence
      • Big Data
      • Cloud Computing
      • Cybersecurity
      • Gaming
      • Internet of Things
      • Mobile
      • Social Media
      • Transportation
      • VR & AR
    • FinTech
    • Personal finance
    • Grides
      • Crypto
      • FinTech
      • Investing
      • Personal Finance Guides
      • Techonology
    • Tools
      • Coins
      • ICO List
      • Organigations
      • Events
    Cryptonounce.com
    Home » Power recommendations and search using an IMDb knowledge graph – Part 2
    Artificial Intelligence

    Power recommendations and search using an IMDb knowledge graph – Part 2

    AdmincryptBy AdmincryptDecember 20, 2022No Comments9 Mins Read
    Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp VKontakte Email
    Share
    Facebook Twitter LinkedIn Pinterest Email


    This three-part series demonstrates how to use graph neural networks (GNNs) and Amazon Neptune to generate movie recommendations using the IMDb and Box Office Mojo Movies/TV/OTT licensable data package, which provides a wide range of entertainment metadata, including over 1 billion user ratings; credits for more than 11 million cast and crew members; 9 million movie, TV, and entertainment titles; and global box office reporting data from more than 60 countries. Many AWS media and entertainment customers license IMDb data through AWS Data Exchange to improve content discovery and increase customer engagement and retention.

    In Part 1, we discussed the applications of GNNs, and how to transform and prepare our IMDb data for querying. In this post, we discuss the process of using Neptune to generate embeddings used to conduct our out-of-catalog search in Part 3 . We also go over Amazon Neptune ML, the machine learning (ML) feature of Neptune, and the code we use in our development process. In Part 3 , we walk through how to apply our knowledge graph embeddings to an out-of-catalog search use case.

    Solution overview

    Large connected datasets often contain valuable information that can be hard to extract using queries based on human intuition alone. ML techniques can help find hidden correlations in graphs with billions of relationships. These correlations can be helpful for recommending products, predicting credit worthiness, identifying fraud, and many other use cases.

    Neptune ML makes it possible to build and train useful ML models on large graphs in hours instead of weeks. To accomplish this, Neptune ML uses GNN technology powered by Amazon SageMaker and the Deep Graph Library (DGL) (which is open-source). GNNs are an emerging field in artificial intelligence (for an example, see A Comprehensive Survey on Graph Neural Networks). For a hands-on tutorial about using GNNs with the DGL, see Learning graph neural networks with Deep Graph Library.

    In this post, we show how to use Neptune in our pipeline to generate embeddings.

    The following diagram depicts the overall flow of IMDb data from download to embedding generation.

    We use the following AWS services to implement the solution:

    In this post, we walk you through the following high-level steps:

    1. Set up environment variables
    2. Create an export job.
    3. Create a data processing job.
    4. Submit a training job.
    5. Download embeddings.

    Code for Neptune ML commands

    We use the following commands as part of implementing this solution:

    %%neptune_ml export start
    %%neptune_ml export status
    %neptune_ml training start
    %neptune_ml training status

    We use neptune_ml export to check the status or start a Neptune ML export process, and neptune_ml training to start and check the status of a Neptune ML model training job.

    For more information about these and other commands, refer to Using Neptune workbench magics in your notebooks.

    Prerequisites

    To follow along with this post, you should have the following:

    • An AWS account
    • Familiarity with SageMaker, Amazon S3, and AWS CloudFormation
    • Graph data loaded into the Neptune cluster (see Part 1 for more information)

    Set up environment variables

    Before we begin, you’ll need to set up your environment by setting the following variables: s3_bucket_uri and processed_folder. s3_bucket_uri is the name of the bucket used in Part 1 and processed_folder is the Amazon S3 location for the output from the export job .

    # name of s3 bucket
    s3_bucket_uri = "<s3-bucket-name>"
    
    # the s3 location you want to store results
    processed_folder = f"s3://{s3_bucket_uri}/experiments/neptune-export/"

    Create an export job

    In Part 1, we created a SageMaker notebook and export service to export our data from the Neptune DB cluster to Amazon S3 in the required format.

    Now that our data is loaded and the export service is created, we need to create an export job start it. To do this, we use NeptuneExportApiUri and create parameters for the export job. In the following code, we use the variables expo and export_params. Set expo to your NeptuneExportApiUri value, which you can find on the Outputs tab of your CloudFormation stack. For export_params, we use the endpoint of your Neptune cluster and provide the value for outputS3path, which is the Amazon S3 location for the output from the export job.

    expo = <NEPTUNE-EXPORT-URI>
    export_params={
        "command": "export-pg",
        "params": { "endpoint": neptune_ml.get_host(),
                    "profile": "neptune_ml",
                    "cloneCluster": True
                      },
        "outputS3Path": processed_folder,
        "additionalParams": {
                "neptune_ml": {
                 "version": "v2.0"
                 }
          },
    "jobSize": "medium"}

    To submit the export job use the following command:

    %%neptune_ml export start --export-url {expo} --export-iam --store-to export_results --wait-timeout 1000000                                                              
    ${export_params}

    To check the status of the export job use the following command:

    %neptune_ml export status --export-url {expo} --export-iam --job-id {export_results['jobId']} --store-to export_results

    After your job is complete, set the processed_folder variable to provide the Amazon S3 location of the processed results:

    export_results['processed_location']= processed_folder

    Create a data processing job

    Now that the export is done, we create a data processing job to prepare the data for the Neptune ML training process. This can be done a few different ways. For this step, you can change the job_name and modelType variables, but all other parameters must remain the same. The main portion of this code is the modelType parameter, which can either be heterogeneous graph models (heterogeneous) or knowledge graphs (kge).

    The export job also includes training-data-configuration.json. Use this file to add or remove any nodes or edges that you don’t want to provide for training (for example, if you want to predict the link between two nodes, you can remove that link in this configuration file). For this blog post we use the original configuration file. For additional information, see Editing a training configuration file.

    Create your data processing job with the following code:

    job_name = neptune_ml.get_training_job_name("link-pred")
    processing_params = f"""--config-file-name training-data-configuration.json 
    --job-id {job_name}-DP 
    --s3-input-uri {export_results['outputS3Uri']}  
    --s3-processed-uri {export_results['processed_location']} 
    --model-type kge 
    --instance-type ml.m5.2xlarge
    """
    
    %neptune_ml dataprocessing start --store-to processing_results {processing_params}

    To check the status of the export job use the following command:

    %neptune_ml dataprocessing status --job-id {processing_results['id']} --store-to processing_results

    Submit a training job

    After the processing job is complete, we can begin our training job, which is where we create our embeddings. We recommend an instance type of ml.m5.24xlarge, but you can change this to suit your computing needs. See the following code:

    dp_id = processing_results['id']
    training_job_name = dp_id + "training"
    training_job_name = "".join(training_job_name.split("-")) training_params=f"--job-id train-{training_job_name}  
    --data-processing-id {dp_id}  
    --instance-type ml.m5.24xlarge  
    --s3-output-uri s3://{str(s3_bucket_uri)}/training/{training_job_name}/" 
    
    %neptune_ml training start --store-to training_results {training_params} 
    print(training_results)
    

    We print the training_results variable to get the ID for the training job. Use the following command to check the status of your job:

    %neptune_ml training status --job-id {training_results['id']} --store-to training_status_results

    Download embeddings

    After your training job is complete, the last step is to download your raw embeddings. The following steps show you how to download embeddings created by using KGE (you can use the same process for RGCN).

    In the following code, we use neptune_ml.get_mapping() and get_embeddings() to download the mapping file (mapping.info) and the raw embeddings file (entity.npy). Then we need to map the appropriate embeddings to their corresponding IDs.

    neptune_ml.get_embeddings(training_status_results["id"])                                            
    neptune_ml.get_mapping(training_status_results["id"])                                               
                                                                                            
    f = open('/home/ec2-user/SageMaker/model-artifacts/'+ training_status_results["id"]+'/mapping.info',  "rb")                                                                                   
    mapping = pickle.load(f)                                                                
                                                                                            
    node2id = mapping['node2id']                                                            
    localid2globalid = mapping['node2gid']                                                  
    data = np.load('/home/ec2-user/SageMaker/model-artifacts/'+ training_status_results["id"]+'/embeddings/entity.npy')                                                                           
                                                                                              
    embd_to_sum = mapping["node2id"]                                                        
    full = len(list(embd_to_sum["movie"].keys()))                                                                                                                                    
    ITEM_ID = []                                                                            
    KEY = []                                                                                
    VALUE = []                                                                              
    for ii in tqdm(range(full)):                                                         
    node_id = list(embd_to_sum["movie"].keys())[ii]
    index = localid2globalid['movie'][node2id['movie'][node_id]]
    embedding = data[index]
    ITEM_ID += [node_id]*embedding.shape[0]
    KEY += [i for i in range(embedding.shape[0])]
    VALUE += list(embedding)
                                                                           
    meta_df = pd.DataFrame({"ITEM_ID": ITEM_ID, "KEY": KEY, "VALUE":VALUE})
    meta_df.to_csv('new_embeddings.csv')

    To download RGCNs, follow the same process with a new training job name by processing the data with the modelType parameter set to heterogeneous, then training your model with the modelName parameter set to rgcn see here for more details. Once that is finished, call the get_mapping and get_embeddings functions to download your new mapping.info and entity.npy files. After you have the entity and mapping files, the process to create the CSV file is identical.

    Finally, upload your embeddings to your desired Amazon S3 location:

    s3_destination = "s3://"+s3_bucket_uri+"/embeddings/"+"new_embeddings.csv"
    
    !aws s3 cp new_embeddings.csv {s3_destination}

    Make sure you remember this S3 location, you will need to use it in Part 3.

    Clean up

    When you’re done using the solution, be sure to clean up any resources to avoid ongoing charges.

    Conclusion

    In this post, we discussed how to use Neptune ML to train GNN embeddings from IMDb data.

    Some related applications of knowledge graph embeddings are concepts like out-of-catalog search, content recommendations, targeted advertising, predicting missing links, general search, and cohort analysis. Out of catalog search is the process of searching for content that you don’t own, and finding or recommending content that is in your catalog that is as close to what the user searched as possible. We dive deeper into out-of-catalog search in Part 3.


    About the Authors

    Matthew Rhodes is a Data Scientist I working in the Amazon ML Solutions Lab. He specializes in building Machine Learning pipelines that involve concepts such as Natural Language Processing and Computer Vision.

    Divya Bhargavi is a Data Scientist and Media and Entertainment Vertical Lead at the Amazon ML Solutions Lab,  where she solves high-value business problems for AWS customers using Machine Learning. She works on image/video understanding, knowledge graph recommendation systems, predictive advertising use cases.

    Gaurav Rele is a Data Scientist at the Amazon ML Solution Lab, where he works with AWS customers across different verticals to accelerate their use of machine learning and AWS Cloud services to solve their business challenges.

    Karan Sindwani is a Data Scientist at Amazon ML Solutions Lab, where he builds and deploys deep learning models. He specializes in the area of computer vision. In his spare time, he enjoys hiking.

    Soji Adeshina is an Applied Scientist at AWS where he develops graph neural network-based models for machine learning on graphs tasks with applications to fraud & abuse, knowledge graphs, recommender systems, and life sciences. In his spare time, he enjoys reading and cooking.

    Vidya Sagar Ravipati is a Manager at the Amazon ML Solutions Lab, where he leverages his vast experience in large-scale distributed systems and his passion for machine learning to help AWS customers across different industry verticals accelerate their AI and cloud adoption.



    Source link

    Share. Facebook Twitter Pinterest LinkedIn Tumblr WhatsApp Email
    Previous ArticlePower recommendation and search using an IMDb knowledge graph – Part 1
    Next Article Reuters reveals Ghana suspending debt service payments on its external obligations
    Admincrypt
    • Website

    Related Posts

    Enable fully homomorphic encryption with Amazon SageMaker endpoints for secure, real-time inferencing

    March 23, 2023

    Automate Amazon Rekognition Custom Labels model training and deployment using AWS Step Functions

    March 22, 2023

    Build a machine learning model to predict student performance using Amazon SageMaker Canvas

    March 22, 2023

    Access Snowflake data using OAuth-based authentication in Amazon SageMaker Data Wrangler

    March 22, 2023

    Leave A Reply Cancel Reply

    Our Picks
    • Facebook
    • Twitter
    • Pinterest
    • Instagram
    • YouTube
    • Vimeo
    Don't Miss
    Stocks

    US Justice Dept’s Google advertising case gets fast-paced schedule By Reuters

    By AdmincryptMarch 24, 20230

    © Reuters. FILE PHOTO: The logo of Google LLC is seen at the Google Store…

    6 River Systems co-founder on the state of warehouse robots

    March 24, 2023

    Are XRP whales assembling for a win as Ripple president oozes confidence

    March 24, 2023

    Databricks Bucks the Herd with Dolly, a Slim New LLM You Can Train Yourself

    March 24, 2023

    Subscribe to Updates

    Get the latest creative news from CRYPTO NOUNCE.

    NEWS
    • Business
    • Crypto
    • Blockchain
    • Markets
    • Technology
    FEATURED SECTIONS
    • Coins
    • ICO List
    • Organigations
    • Events
    • Grides
    FEATURED LINKS
    • Story of the day
    • Videos
    • Infographics
    CONNECT WITH US
    • Facebook
    • Twitter
    • Telegram
    • LinkedIn
    • Pinterest
    ABOUT US
    • Contact
    • Advertise
    • Sitemap
    Copyright © 2023 Cryptonounce All rights reserved. Cryptonounce.
    • Home
    • Buy Now

    Type above and press Enter to search. Press Esc to cancel.

    Sign In or Register

    Welcome Back!

    Login to your account below.

    Lost password?