Introducing CYMO: Next-Generation Text Mining and Analytics Tool

Are you tired of black-box AI models, where decisions seem to emerge from a mysterious void? Meet CYMO by Exaia – your solution to building transparent, explainable, and robust AI models.

Built on 30+ years of expertise in computational linguistics, NLP, machine learning, text analytics and speech recognition, CYMO is designed to unlock deep insights from unstructured data, enabling the creation of AI models that are both highly performant and interpretable.

Why Transparency Is Essential in High-Stakes Domains

Artificial intelligence (AI) and algorithmic decision-making have a significant impact on our daily lives. In high-stakes areas such as healthcare, implementing transparent AI is not only beneficial but essential to ensure optimal outcomes and minimize potential risks.

CYMO ensures that machine/deep learning models trained on its expert-engineered features deliver high levels of explainability and robustness. By providing a multidimensional view of language through its comprehensive feature set, CYMO empowers models to offer clear, interpretable decisions, reducing the opacity often associated with AI systems.

This transparency fosters trust, supports better decision-making, and facilitates compliance with ethical and regulatory standards, particularly in high-stakes domains such as mental health diagnostics, deception detection, language skills assessment, and justice.

The features implemented in CYMO also enhance the performance of AI models by capturing nuanced linguistic patterns, ensuring that the predictions and classifications they make are not only transparent but also highly accurate and robust across diverse applications.

Table comparing LLM and CYMO performance in bipolar disorder detection.

This table demonstrates a use case for social media health mining, specifically bipolar disorder detection, by comparing the performance and generalizability of black-box LLMs and more transparent ML models trained on expert-engineered features from CYMO. The CYMO-based model outperforms the fine-tuned LLM in both within-domain and out-of-domain settings, showing superior accuracy and robustness.

Leverage Expert-Engineered Features for Comprehensive Insights

CYMO delivers 344 expert-engineered features designed to give you precise, multifaceted measurements from text and speech. These features are optimized for natural language processing (NLP) and machine learning (ML) tasks, enhancing the accuracy and transparency of your AI models. The selection and implementation of these features are informed by extensive interdisciplinary research and literature on how humans learn and process language, the impact of affective and cognitive factors on language processing, the relationship between language and mental health, affective computing, stylistics, and readability.

The features are organized into eight distinct categories, as follows:

Syntactic Complexity and Sophistication: Features related to sentence structure, including production unit length, sentence complexity, subordination, coordination, and specific structures.
Lexical Richness: The diversity and sophistication of vocabulary, including lexical diversity, lexical sophistication, lexical density, and word prevalence.
Cohesion: The connections between ideas, such as lexical overlap and the use of connectives to link different parts of the text.
Stylistics: Patterns of language use pertaining to register/genre/style, captured through frequency and distribution of n-grams.
Readability: Measures of text comprehensibility, including factors like sentence length, word choice, and grammatical complexity, exemplified by Flesch-Kincaid Grade Level.
Grammatical Categories: Features pertaining to specific grammatical functions, including prepositions, determiners, auxiliary verbs, pronouns, conjunctions, and quantifiers.
Topical Categories: Features associated with thematic domains, including Art, Business, Education, Entertainment, Food, Health, Music, Politics, Science, Sports, and Technology.
Emotion Categories: Sentiment and emotions encoded based on well-established psychological models.

Advanced Measurement Techniques for High-Resolution Insights

CYMO goes beyond traditional text mining and analytics with its innovative sliding-window approach. This technique allows for high-resolution measurement of text features at a granular level, processing data sentence-by-sentence. By capturing feature distributions in this way, CYMO ensures that every segment of the text is analyzed with pinpoint precision, offering localized insights that uncover subtle patterns often missed by traditional methods.

Three Selected Use Cases and Their Real-World Impact

Empowering Developers, Researchers, and Data Scientists in Building XAI-Powered Models

Digital Biomarkers for Brain and Mental Health

Unlock the potential of CYMO to create explainable AI models leveraging digital biomarkers for advanced brain and mental health diagnostics. These biomarkers are revolutionizing healthcare, enabling breakthroughs in disease prevention, diagnostics, and monitoring.

Uncover Subtle Signals: Detect subtle statistical signals in speech and text related to neurodegenerative and psychiatric disorders.
Trust in Rigorous Validation: Rely on biomarkers validated through computational and clinical processes for accurate diagnostics.
Enhance Predictive Models: Develop advanced models for early and precise disorder detection and ongoing monitoring.
Monitor Disease Progression: Track disease progression and assess treatment effects over time.

For more information, explore: Digital Biomarkers Use Case

Verbal Deception Detection

CYMO enables the extraction of expert-engineered features from verbal behavior, supporting the detection of deception across key sectors like cybersecurity, education, law enforcement, and media.

Cybersecurity: Identify phishing and fraudulent communications.
Education & Research: Uphold academic integrity and detect misleading research.
Law Enforcement: Detect false testimonies and criminal deception.
Media: Help journalists fact-check and uncover misinformation.
Build Explainable AI Models: Develop transparent models that detect deception and fraud with clear insights.

Language Proficiency and Readability Assessment

CYMO extracts features that provide precise operationalizations of CEFR descriptors, going beyond traditional metrics to offer more comprehensive language proficiency assessments.

Personalized Learning: Tailor content and lessons to individual proficiency levels.
Efficient Placement: Accurately assign students to appropriate language classes.
Streamlined Recruitment: Assess job candidates’ language proficiency with precision.
Content Localization: Tailor content for specific audiences based on language proficiency.

For use cases and real-world applications, explore our scientific research at Scientific Research Publications.

Get Started with CYMO

Whether you’re a developer building next-gen AI tools, a researcher tackling linguistic phenomena, or a data scientist solving business problems, CYMO delivers precise, expert-engineered features that empower the creation of explainable and robust AI systems. Get Started Today with CYMO. For a detailed guide, check out our CYMO tutorial on GitHub to learn how to leverage its full capabilities.

Get started