Corporate Distress Prediction Model#

1. Research Topic#

Construction of a corporate distress prediction model combining text analysis and financial ratios

2. Research Background and Objectives#

  • Despite the importance of analysis and prediction of corporate credit risk for the soundness of the financial system, traditional credit risk assessment methods mainly rely on financial ratios and structural models.

    • Structural credit risk models, like the Black-Scholes-Merton model, treat a firm’s equity as a call option on its assets and estimate a firm’s default probability by evaluating asset volatility, default points, drift components, etc. This approach has the advantage of effectively representing complex financial market phenomena with mathematical models, but also has the disadvantage that the model’s assumptions do not always match reality.

    • With recent advances in machine learning technology and the increased use of big data, it is now possible to analyze non-financial, unstructured text data such as news articles, credit rating agency reports, and management announcements.

    • Such text data contain various information that can influence a company’s credit risk, necessitating its analysis using text mining techniques for utilization in credit risk prediction.

  • The objective of this research is, based on this background, to develop a credit risk prediction model that more comprehensively and accurately assesses a company’s credit risk by integrating financial ratio analysis and text mining techniques.

    • First, a quantitative evaluation of a company’s financial soundness is conducted by analyzing profitability, liquidity, capital structure, operating efficiency, etc. from a company’s financial statement data.

    • Next, a two-step approach is used to extract topics related to credit risk from unstructured text data such as news articles, credit rating agency reports, and management announcements using topic modeling techniques, and to estimate the possibility of distress based on this.

    • The potential risk factors of corporate credit are quantified using the results of the text mining model, and then combined with the results of financial ratio analysis to produce the final corporate credit risk score.

      • This model effectively combines the traditional structural model approach with timely text data to predict corporate credit risk and induce effective policy responses to corporate risk.

3. Key Content and Analysis Method#

  • We utilize a comprehensive approach that combines structured financial ratio data and unstructured text data to predict a company’s credit risk more accurately and in real-time.

    • This approach is based on three core elements: financial ratio analysis, text analysis, and the model that combines them.

Financial Ratio Analysis#

  • Initially, we measure the financial soundness of a company through financial ratios such as profitability ratios (e.g., return on equity, profit margin), debt ratios (e.g., net debt/equity), coverage ratios (e.g., EBITDA/interest, cash flow/debt), liquidity ratios (e.g., current ratio, cash/debt), and growth rates (e.g., ROE growth rate, stability of EPS growth).

    • These ratios provide essential indicators of a company’s financial situation and provide an objective and quantitative comparison standard for assessing a company’s credit risk.

Text Analysis#

  • We extract information related to credit risk from unstructured text data through text mining.

  • This provides more immediate and direct information compared to the past information provided by financial ratio analysis and contributes to predicting the company’s insolvency risk.

  • We collect real-time news articles, credit rating agency reports, management announcements of the company, and identify language patterns related to credit risk.

  • (Topic Model) To effectively analyze text data, we extract major topics from text data using topic modeling.

  • We use official reports from credit rating agencies to train a topic model* that can determine which news articles and reports contain information related to the company’s credit rating.

    * Techniques like Latent Dirichlet Allocation (LDA) or Latent Semantic Indexing (LSI) could be used.

  • (Insolvency Estimation Model) We measure the intensity of the company’s insolvency possibility shown by articles classified as topics closely related to corporate credit rating using a specialized language model* trained with separate relevant texts indicating corporate insolvency.

    * A Finetuned Language Model is a model that further trains foundational models, such as GPT or LLAMA, which show excellent performance in natural language processing tasks, according to the analysis purpose.

[Figure 1] Composition and flow of the corporate insolvency prediction model

graph TB A[Financial Ratio Analysis] --> |Calculate Financial Ratios| AA[Ratios: Profitability, Leverage, Coverage, Liquidity, Growth] B[Textual Analysis] --> |Collect News Articles| BB[Identify Key Language Patterns] BB --> |Machine Learning Algorithms| CC[Indicative Features of Financial Distress] AA --> D[Combination of Models] CC --> D D --> |Adjust Weight based on Text Volume| E[Holistic Model] E --> F[Handle Missing Data] F --> |Incorporate Momentum in Credit Risk| G[Comprehensive Credit Risk Prediction] G --> H[Final Score: Comprehensive Evaluation of Company's Credit Risk]

Model Combination#

  • As the amount of text data that can be collected varies depending on the size and characteristics of the company, we complement this by combining the results of financial ratio analysis and text analysis.

    • As the weight given to text mining is adjusted according to the amount of text collected per period, the combination can reflect the market’s interest and importance in the company’s financial issues, thereby representing the momentum of credit risk.

  • By using the text-based credit score and financial ratio data, we construct a model that can measure the risk of corporate insolvency by estimating a panel probit model considering fixed effects.

    \[\begin{split} Pr(\Delta C_{i,t}<0|x{i,t},\alpha_i) = \beta_1 \Delta \ln \Pi_{it} + \beta_2 \Delta \ln D_{i,t} + \beta_3 \Delta \ln EBITDA_{i,t} \\ + \beta_4 \Delta \ln Liq_{i,t} + \Delta \ln G_{i,t} + \beta_6 \Delta \ln S_{i,t} + X_{i,t} + \alpha_i + \tau_t + \upsilon_{i,t} \end{split}\]

    \(C_{i,t}\) represents the credit rating of company \(i\) at period \(t\), \(\Pi_{it}\) represents the profitability ratio, \(D_{i,t}\) represents the debt ratio, \(EBITDA_{i,t}\) represents the interest coverage ratio, \(Liq_{i,t}\) represents the liquidity ratio, \(G_{i,t}\) represents the growth rate, \(S_{i,t}\) represents the text-based credit score, \(X_{i,t}\) represents control variables reflecting company characteristics, \(\alpha_i\) represents company-specific fixed effects, \(\tau_t\) represents time-specific fixed effects.

4. Policy Implications#

  • This study improves the accuracy of corporate credit evaluation by using a hybrid approach that combines financial data and text data, compared to the traditional methodology that approaches financial data and text data independently.

  • The use of real-time data in predicting corporate insolvency risk allows us to more accurately reflect events that influence the financial soundness of a company, and its importance is expected to be highlighted in a rapidly changing financial environment.

  • The performance of the model heavily depends on the selected text sources and the quality of the data, so future research can improve model performance by using various text data sources or improving text mining and topic modeling methods.

  • The model can be expanded to include non-financial factors, such as the company’s social responsibility, environmental, governance (ESG) indicators, which influence credit risk prediction.