Financial Services

In this section we cover data science use cases for the following sub-verticals of financial services:

Investment Banking

Use Cases Use Case Sub Group Dataset Method Benefit Download Use Case Report
Investment Portfolio Optimization Portfolio Optimization, Robo Advising S&P 500 Index data from April1, 2016 to Feb. 1, 2021 Inverse Optimization and Deep Reinforcement Learning agents, he deep deterministic policy
gradient (DDPG) algorithm, Gaussian exploration (OU process)
Portfolio Rebalancing, Determining the most optimal investment portfolio, including portfolio diversification, based on market data and investor risk preference, less need for investor involvment with high expected ROI, potential increase in customer satisfaction if advisors are able to give them stock recommendations based on a proven prediction algorithm Request Custom Report
Understanding Multiple Stock Trading Patterns for High ROI Stock Price Prediction China's CSI800 stock from baostock.com Temporal Routing Adaptor (TRA), Optimal Transport (OT) Significant gains to investor portfolio, allows investors and advisors to view how different trading strategies can influence ROI over time to make better informed decisions Request Custom Report
Investment Portfolio Management for Maximizing ROI Portfolio Management Stock Market Data from January 2, 2002 to March 24, 2020 Deep Reinforcement Learning, Actor-Critic Reinforcement Learning, Graph Convolutional Networks, Spectral GCN, Autoencoders for feature extraction maximize investor's portfolio ROI, identifying market patterns, evaluating trading strategies, and automating workflows Request Custom Report
Stock Price Predictions Stock Price Predictions Raw market price data from the Chinese market index (China Securities Index 300 (CSI-300)) with 6 indicators (the closest price, high price, low price, opening price, amount, and volume), China A-share market from January 1, 2010 to December 31, 2019 via Tushare, and open source financial data package. Time Series Graph Framework, time series embedding module, visibility graph algorithm for time series transformation, struc2vec, collective influence algorithm, attention-based RNNs, trading simulations to validate profitability and stability of the framework obtains the highest average return (47.91%) in trading simulations compared to state-of-the-art baselines, better stock  and trade prediction insight, benefits of using a graph-based network for financial time series prediction compared to LSTM, DARNN, DARNN-SA, MFNN, and CA-SFCN Request Custom Report
Start-up company evaluation and success prediction for investors, data for policy makers related to observable factors, such as diversity in the workplace, that could signal high growth potential Start-up Investment Risk venture capital dataset from PitchBook covering worldwide VC investment-level activities from 1977 to 2019 incremental graph learning, graph representation learning, barpartite graphs, Graph Self-aTtention (GST) NN, node-level representation learning, sequential graph representation extraction Less risk in start-up investments, Start-up success forecasting, helps to reveal impacts on start-up companies from observable factors such as gender, education, and networking for policy makers Request Custom Report
Trading strategy for cryptocurrency Cryptocurrency Investment Strategy Price data from the Binance cryptocurrency exchange API (free) from mid-2017 to April 2021 k-NearestNeighbor as an example-based learner,  eXtreme Gradient Boosting and Random Forest classifiers as tree-based learners help investors gain a higher ROI on their crypto investments and which to choose to invest in Request Custom Report
Stock Market Forecasting and Initial Public Offering (IPO) Planning Stock Price Predictions S&P500 index data from Yahoo Finance containing highest, lowest, opening, and closing values and the volume of traded stocks for a particular date. Regression to predit closing price of stock of a company (Simple Linear, Polynomial, SVR, Decision Tree, and Random Forest), Classification to predict whether the stock will increase or decrease the next day (SVM, KNN, Logistic Regression, Naïve Bayes, Decision Tree, Random Forest) potential increase of ROI for investors, early knowledge of next trading day move decisions, a more accurate value and share release predictions based on historical data for IPOs Request Custom Report
Stock Market Manipulation Detection Cryptocurrency Market Manipulation Custom dataset of pump and dump schemes carried out on Binance. Trade records: Price, Operation Type (buy/sell), and the UNIX timestamp. Random Forest, AdaBoost identifying stock market manipulation in crypto and traditional markets, forecasting market manipulation Request Custom Report
Portfolio Optimization with Predicted ROI Portfolio Optimization China Securities 100 Index from January 4, 2007 - December 31, 2015 Random Forest, Support Vector Regression, LSTM, deep multi-layer perceptron (DMLP), CNN predicting ROI of investment portfolios, less risk on investments, potential increase of ROI Request Custom Report
Stock Price Prediction though Social Media Sentimental Analysis Stock Price Predictions through Social Media Tweets from Twitter API Bag of Words (NLP), Random Forest finding potential investments, identifying a potential instance of market manipulation, find trending stocks of retail investors or others, stock price prediction Request Custom Report
Sentimental Analysis of Investors via Social Media App for Investors for Stock Price Forecasting Stock Price Predictions through Social Media 13,771,091 comments/messages from StockTwits (a social media platform for investors) from 2018-2019 Gradient Boosting Decision Trees (GBDT), LASSO Regression distinguishing between experienced investors vs new investors, finding potential investments, understanding sentimental value of a company or stock, potential to predict future market manipulation, potential higher-yield portfolios Request Custom Report
Day Trading System Day/Swing Trading Strategies Brazil Stock Exchange: 1824 varying days from January 2, 2015 to December 2019 Deep Deterministic Policy Gradient (DDPG) finding ideal buy/sell times for day and swing traders, predicting stock prices throughout the day, potential for less financial risk for new and experienced day/swing traders Request Custom Report
Cryptocurrency Price Prediction and Analysis Cryptocurrency Stock Price Prediction Crypto time series data scraped from Coin Market Long Short-Term Memory (LSTM), Mean Absolute Error (MAE) Predicting crypto prices, predicting best buy/sell time for the crypto market, Gives businesses an idea of which cryptocurrencies may be beneficial to adopt for future payments Request Custom Report
Recommendation of High-Return Stocks Based on Correlation Information Stock Recommendation Chinese stock market: involved 738 stocks, 2223 trading days, and five basic features (i.e., opening
price, closing price, highest price, lowest price, and trading volume)
Stacked Graph Neural Network (GNN), Relation-aware Dynamic Attributed Graph Attention Network (RA-AGAT) gives investors a stock selection strategy with the lowest risk and highest return ratio for stocks in China, allows investors outside of China to know which companies would be the best to invest in if they are looking to diversify their portfolio with international stocks Request Custom Report
Robo-advising for investment portfolio management Robo Advising, Portfolio Management Customer interraction with robo advisors at a German FinTech institution between February 13-28, 2018 Partial Least Squares (PLS), Generalized Structured Component Analysis (GSCA) More streamlined investing for customers, less overhead for human advisors, can allow human advisors to work with current customers in-depth while robo advisors perform initial onboarding for an overall idea of the customers' needs, potentially less paperwork with having e-documents/agreements for initial portfolio setup Request Custom Report
Stock Price Fluctuation Over Time based on News Reports Stock Price Trends Financial Media Data including title, author, source, and release time for 24 listed companies in the same industry from March 10, 2013 to December 30, 2017 Deep Bidirectional LSTM, Word2Vec (NLP) Gives investors a better understanding as to whether the price is increasing due to news alone or because of true company growth, gives investors more insight to make better, more rational investment decisions Request Custom Report

Finance

Use Case Use Case Sub Group Dataset Method Benefit Download Use Case Report
Greater protection of financial records and finding weaknesses in current ML models using transaction records Increased Data Security Client transaction records listing type of transaction, transaction amount, and a timestamp for each transaction from three different financial institutions Adversarial Attack and Defense using NLP Models: Masked Language Model (MLM), Fast Gradient Sign Method (FGSM), Concat FGSM, LM FGSM, Concat Sampling Fool (Concat SF), Sequential Concat Sampling Fool (Seq Concat SF) improved fraud detection and scoring system by banks, potential for more secure system to protect data Request Custom Report
Differential Privacy Increased Data Privacy/Security Lending Club dataset from 2016-2017 Logistic Regression and Gradient Boosting Trees for Diffrerential Privacy (DP), and Random Forest for predict the credit conversion factor (CCF), beta regression, linear regression increased privacy while retrieving information, specifically for credit risk Request Custom Report
Attracting New Customers and Customer Retention New Customers and Customer Retention Kaggle and UC Irvine Machine Learning Repo relating to banking (accounts, loans, transactions, mobile banking, engagement, and credit accounts) and general information about an individual (demographic, occupation, health, wealth & assets, social activities) Multi-Task Learning, dynamic feature selection, Logistic Regression creating new, attractive products based on dynamic feature selection, increase of transactions for financial institution Request Custom Report
Credit Score Assessment and Default Probability Probability of Default Based on Credit Score Lending company data from the Polish Credit Bureau Logistic Regression, Logistic Regression with weight evidence (WOE) transformations, Random Forest, Gradient Boosting Models (GBM), and Extreme Gradient Boosting Model (XGB) Automatically assessing credit risk and potential for default Request Custom Report
Financial Crime and Money Laundering Detection and Prevention Fraud Detection and Prevention Bank data including: Transaction ID, Account ID, Account/Customer Type, Product/Transaction Type, Transaction Code, Transaction Branch, Source Bank, Destination Bank, Transaction Amount, Average Amount of Transactions in Previous Month, Transaction Currency, Credit/Debt Status, Country of Origin, Country of Destination, Credit Score Classification, Anomaly Detection, Logical AND on results, Snorkel model proposed model aimed to detect fraud with minimal human intervention, larger-scale automatic fraud detection at financial institutions, decrease potential money laundering crimes at financial institutions Request Custom Report
Customer Churn Prediction in Banking Industry Customer Churn Prediction Public dataset from Kaggle containing 28,382 records and 21 features (attributes) of demographic information, customer bank relationship, and transactional informaino such as current balance. Data Pre-processing: Exploratory Data Analysis and Bivariate Analysis , ML Models: Logistic Regression, Decision Tree, K Nearest Neighbor, Random Forest customer retention, customer churn prediction Request Custom Report
Credit Rating Prediction and Risk Assessment Risk Assessment based on Credit Rating Financial Ratings: a time-series dataset of income from about 302 companies from MCM (Contemporary Undergraduate Mathematical Contest in Modeling - http://en.mcm.edu.cn) Lasso and recursive feature elimination to find key features. Random Forest, SVM, and Gradient Boosted Classification. Assessing credit risks for individuals and companies, insight to create new policies, minimize financial loss and risk for financial institutions Request Custom Report
Counterfeit Currency Detection Fraud Detection and Prevention UC Irvine Machine Learning Repository: images of real and forged currency. 1372 total instances and 5 attributes SVM, Random Forest, Logistic Regression, Naïve Bayyes, Decision Tree, K-Nearest Neighbor currency fraud detection, decrease potential of crime related to counterfeit money, potentially less human error when checking for real vs fake currency Request Custom Report
Risk Prediction for Bank Loan Approvals Bank Loan Risk Assessment Kaggle Competition Dataset. Loan applicants featuring 13 attributes: loan ID, gender, marriage status, dependents, education, self employment, income, co-applicant income, loan amount, loan amount term, credit history, property area, and loan status Logistic Regression, Random Forest less finanical risks for banks pertaining to loans, automated approval/denial for loan applications, faster approval/denial process Request Custom Report
Marketing and Sales in Busineses and Bank New Customers and Customer Retention UC Irvine Machine Learning Repository: direct marketing campaigns of a Portuguese banking institution. It consists of 45k instances with 16
different attributes (age, job, marital, education, default,
balance, housing, loan, contact, day, month, duration,
campaign, pdays, previous, poutcome, and the outcome).
Decision Tree, K-Nearest Neighbor, Random Forest, Naïve Bayes, Support Vector Machine Sales Forecasting, Predicting Customer Preferences, Business and Customer Growth, Enhance Sales Agent Performance Request Custom Report
Credit Risk Prediction and Probability of Customer Default Credit Risk and Loan Defaulting Dataset 1: customer demographics, credit history, loan amount, loan duration, etc. , Dataset 2: customer demographics, history of payment, credit data, payments and bill statements of credit card clients residing in Taiwan from April 2005 to September 2005 Feature Extraction: Principal Component Analysis (PCA) and Linear Discriminant Analysis (LDA), ML: Multiple Linear Regression, Logistic Regression, SVM, KNN, Kernel SVM, Decision Tree, Random Forest, Naïve Bayes, and Gaussian Naïve Bayes Allows financial institutions to invest in safer options, less risk of losing money to customers who default on their loan, potential for a more stable financial balance throughout the years, more efficient approval/denial process based on output, reducing capital loss Request Custom Report
Predicting Loan Defaulters Loan Defaulting Realtime data from Kaggle which illustrates loan administration
experience within the U.S. small business social circle
Principal Component Analysis (PCA), Linear Discriminant Analysis (LDA), Logistic Regression with Stochastic Gradient Descent (SGD), Random Forest, K-Nearest Neighbor (KNN) balanced or gain in capital, more efficient loan approval proccess, more efficient lending process without guessing about customer's potential to default and the bank to lose money, less risk to the employee or company when lending money to customers Request Custom Report
Credit Risk Assessment for Supply Chain Finance Credit Risk Assesment Small and Mid-sized Enterprise data from the Computer and Electronic Communications Manufacturing Industry Firefly Algorithm Support Vector Machine (FA-SVM) analyzing the credit risk of supply chain companies rather than individuals, less risk of financial loss when giving out business loans, understanding which companies the bank can trust to increase or decrease loan credit to mitigate a loss Request Custom Report
Indentifying Fraudulent Financial Statements Financial Crime Prevention Fraud Financial Statements from Chinese publically traded companies from 2006-2019 Pearson Correlation Coefficient, Naïve Bayes (Kernel), SVM, KNN, Decision Trees, Logistic Regression capital gain if with the reduction of fraud cases, early fraud detection to mitigate loss, less risk in lending Request Custom Report
Implementing Blockchain Technology in the Banking Sector Blockchain in Banking Blockchain consultants, Blockchain marketing experts, CEOs/business heads that are in the process of advising, consulting, or implementing blockchain technology Confirmatory Factor Analysis (CFA), Principal Component Analysis (PCA), increased privacy and security to company and customer data, potentially higher customer trust due to more complex security measures, improve transparancy and tracability of transactions, streamline business process, save money overall through a more secure and efficient banking and security process, reduce fraud, reduced human intervention Request Custom Report
Creating an More Efficient ATM Service Model ATM Services Transactions in ruble ATMs with a cash recycling function
containing records of cash deposits and withdrawals, as well as on collections from February to September 2018
Linear Regression, Lasso, Ridge Regression, Elastic Network, Gradient Decision Trees, Random Forests, and SVMs increase efficient cashflow management in ATMs, helps to ensure the right amount of cash is available in the ATM without risking a loss if the ATM malfunctions or the excess money is taken out somehow, reduces cost for financial monitoring, predicting supply and demand of ATM and overall company cashflow from bank to customer and customer to bank Request Custom Report
Detecting Fraudulent Credit Card Transactions Fraud Detection and Prevention Customer credit card transaction data from ULB Machine Learning Group downloaded from Kaggle Autoencoders, Multi-Layer Perceptron, KNN, Logistic Regression minimize losses from fraudulent transations for credit card issuing banks, more accurate fraud detection, improve customer trust with better fraud detection Request Custom Report
Credit Score Evaluation Credit Risk Assesment Customer credit data Naïve Bayes, Logistic Regression, Random Forest, Decision Tree, KNN lower risk in P2P lending, less human intervention, less overhead costs due to a more automated process Request Custom Report
Bank Customer Classification to Provide Advice for Banks to Better Manage Customer Relationships Bank/Customer Relationship Customer data consisting of customer ID, age, and credit score Multi-Class Extreme Learning Machine (ELM), Label Proportions (LLP) Building better relationships with customers, customer retention and higher reviews, more referals, brand/customer loyalty, lower customer churn rate Request Custom Report