A major research problem of Artificial Neural Networks (NNs) is to reduce the number of model parameters. The available approaches are pruning methods, consisting in removing connections of a dense model, and natively sparse models, based on training sparse models using meta-heuristics to guarantee their topological properties. In this paper, the limits of both approaches are discussed. A novel hybrid training approach is developed and experimented, based on a linear combination of sparse unstructured NNs, which are joint because they share connections. Such NNs dynamically compete during the optimization, since the less important networks are iteratively pruned, until the most important network remains. The method, called Competitive Joint Unstructured NNs (CJUNNs), is formalized together with an efficient derivation in tensor algebra, which has been implemented and publicly released. Experimental results show its effectiveness on benchmark datasets and in comparison with structured pruning.
Oral squamous cell carcinoma recognition presents a challenge due to late diagnosis and costly data acquisition. A cost-efficient, computerized screening system is crucial for early disease detection, minimizing the need for expert intervention and expensive analysis. Besides, transparency is essential to align these systems with critical sector applications. Explainable Artificial Intelligence (XAI) provides techniques for understanding models. However, current XAI is mostly data-driven and focused on addressing developers requirements of improving models rather than clinical users demands for expressing relevant insights. Among different XAI strategies, we propose a solution composed of Case-Based Reasoning paradigm to provide visual output explanations and Informed Deep Learning (IDL) to integrate medical knowledge within the system. A key aspect of our solution lies in its capability to handle data imperfections, including labeling inaccuracies and artifacts, thanks to an ensemble architecture on top of the deep learning (DL) workflow. We conducted several experimental benchmarks on a dataset collected in collaboration with medical centers. Our findings reveal that employing the IDL approach yields an accuracy of 85%, surpassing the 77% accuracy achieved by DL alone. Furthermore, we measured the human-centered explainability of the two approaches and IDL generates explanations more congruent with the clinical user demands.
In recent years, the robotics field has witnessed an unprecedented surge in the development of humanoid robots, which bear an increasingly close resemblance to human beings in appearance and functionality. This evolution has presented researchers with complex challenges, particularly in the domain of controlling the increasing number of robotic motors that animate these lifelike figures. This paper focuses on a novel approach to managing the intricate facial expressions of a hu- manoid face endowed with 22 degrees of freedom. We introduce a groundbreaking inverse kinematic model that leverages deep learning regression techniques to bridge the gap between the visual representation of human facial expressions and the corre- sponding servo motor configurations required to replicate these expressions. By mapping image space to servo motor space, our model enables precise, dynamic control over facial expressions, enhancing the robot's ability to engage in more nuanced and human-like interactions. Our methodology not only addresses the technical complexities associated with the fine-tuned control of facial motor servos but also contributes to the broader discourse on improving humanoid robots' social adaptability and interaction capabilities. Through extensive experimentation and validation, we demonstrate the efficacy and robustness of our approach, marking a significant advancement in humanoid robotics control systems.
This study introduces a novel approach for generating high-quality, language-specific chat corpora using a self-chat mechanism. We combine a generator LLM for creating new samples and an embedder LLM to ensure diversity. A new Masked Language Modelling (MLM) model-based quality assessment metric is proposed for evaluating and filtering the corpora. Utilizing the llama2-70b as the generator and a multilingual sentence transformer as embedder, we generate an Italian chat corpus and refine the Fauno corpus, which is based on translated English ChatGPT self-chat data. The refinement uses structural assertions and Natural Language Processing techniques. Both corpora undergo a comprehensive quality evaluation using the proposed MLM model-based quality metric. The Italian LLM fine-tuned with these corpora demonstrates significantly enhanced language comprehension and question-answering skills. The resultant model, cerbero-7b, establishes a new state-of-the-art for Italian LLMs. This approach marks a substantial advancement in the development of language-specific LLMs, with a special emphasis on augmenting corpora for underrepresented languages like Italian.
Background: Although hysteroscopy with endometrial biopsy is the gold standard in the diagnosis of endometrial pathology, the gynecologist experience is crucial for a correct diagnosis. Deep learning (DL), as an artificial intelligence method, might help to overcome this limitation. Unfortunately, only preliminary findings are available, with the absence of studies evaluating the performance of DL models in identifying intrauterine lesions and the possible aid related to the inclusion of clinical factors in the model. Aim: To develop a DL model as an automated tool for detecting and classifying endometrial pathologies from hysteroscopic images. Methods: A monocentric observational retrospective cohort study was performed by reviewing clinical records, electronic databases, and stored videos of hysteroscopies from consecutive patients with pathologically confirmed intrauterine lesions at our Center from January 2021 to May 2021. Retrieved hysteroscopic images were used to build a DL model for the classification and identification of intracavitary uterine lesions with or without the aid of clinical factors. Study outcomes were DL model diagnostic metrics in the classification and identification of intracavitary uterine lesions with and without the aid of clinical factors. Results: We reviewed 1500 images from 266 patients: 186 patients had benign focal lesions, 25 benign diffuse lesions, and 55 preneoplastic/neoplastic lesions. For both the classification and identification tasks, the best performance was achieved with the aid of clinical factors, with an overall precision of 80.11%, recall of 80.11%, specificity of 90.06%, F1 score of 80.11%, and accuracy of 86.74 for the classification task, and overall detection of 85.82%, precision of 93.12%, recall of 91.63%, and an F1 score of 92.37% for the identification task. Conclusion: Our DL model achieved a low diagnostic performance in the detection and classification of intracavitary uterine lesions from hysteroscopic images. Although the best diagnostic performance was obtained with the aid of clinical data, such an improvement was slight.
Today, globalized markets require more resilient and agile manufacturing systems, as well as customized and virtualized features. Classical self-standing manufacturing systems are evolving into collaborative networks such as Cloud Manufacturing (based on centralized knowledge and distributed resources) or Shared Manufacturing (based on fully decentralized knowledge and distributed resources) as a solution to ensure business continuity under normal as well as special circumstances. Additive Manufacturing (AM), one of the enablers of Industry 4.0 (I4.0), is a promising technology for innovative production models due to its inherent distributed capabilities, digital nature, and product customization ability. To increase the adaptivity of distributed resources using AM technology, this paper proposes a mechanism for sharing workload and resources under unexpected behaviours in the supply chain. Smart contracts and blockchain technology in this concept are used to provide decentralized, transparent, and trusted operation of such systems, which provide more resilience to disruptive factors. In this paper, the proposed Blockchain-based Shared Additive Manufacturing (BBSAM) protocol, ontology, and workflow for AM capacity pooling are discussed and analysed under special conditions such as anomalous demand. Discrete-time Python simulation on a real Italian AM market dataset, also provided, is available on GitHub.
Most of the applications of wireless sensor networks require the continuous coverage of a region of interest. The irregular deployment of the nodes, or their failure, could result in holes in the coverage, thus jeopardizing such requirement. Methods to recover the sensing capabilities usually demand the availability of redundant full-fledged nodes, whose relocation should heal the holes. These solutions, however, do not consider the high cost of obtaining redundant, typically complex, devices, nor that they could in turn fail. In this work, we propose a bio-inspired and emergent approach toward hole detection and healing using a swarm of resource-constrained agents with reduced sensing capabilities, whose behavior draws inspiration from the concepts underlying blood coagulation. The swarm follows three rules: activation, adhesion, and cohesion, adapted from the behavior exhibited by platelets during the human healing process. Relying only on local and relative information, the mobile agents can detect the holes border and place themselves in locally optimal positions to temporarily restore the service. To validate the algorithm, we have developed a distributed, multi-process simulator. Experimental results show that the proposed method efficiently detects and heals the holes, outperforming two state-of-the-art solutions. It also demonstrates good robustness and flexibility to agent failure.
Structural health monitoring (SHM) using IoT sensor devices plays a crucial role in the preservation of civil structures. SHM aims at performing an accurate damage diagnosis of a structure, that consists of identifying, localizing, and quantify the condition of any significant damage, to keep track of the relevant structural integrity. Deep learning (DL) architectures have been progressively introduced to enhance vibration-based SHM analyses: supervised DL approaches are integrated into SHM systems because they can provide very detailed information about the nature of damage compared to unsupervised DL approaches. The main drawback of supervised approach is the need for human intervention to appropriately label data describing the nature of damage, considering that in the SHM context, providing labeled data requires advanced expertise and a lot of time. To overcome this limitation, a key solution is a digital twin relying on physics-based numerical models to reproduce the structural response in terms of the vibration recordings provided by the sensor devices during a specific events to be monitored. This work presents a comprehensive methodology to carry out the damage localization task by exploiting a convolutional neural network (CNN) and parametric model order reduction (MOR) techniques to reduce the computational burden associated with the construction of the dataset on which the CNN is trained. Experimental results related to a pilot application involving a sample structure, show the potential of the proposed solution and the reusability of the trained system in presence of different loading scenarios
Oral squamous cell carcinoma (OSCC) is a significant health issue in the oral cancer domain; a screening tool for timely and accurate diagnosis is essential for effective treatmentplanning and prognosis in patients' life expectancy. In this paper, we address the problem of object detection and classification in the context of OSCC, by presenting a comparative analysis of three state-of-the-art architecture: YOLO, FasterRCNN, and DETR. We propose a deep learning ensemble model to address both object detection and classification problem leveraging the strengths of individual models to achieve higher performance than single models. The proposed architecture was evaluated on a real-world dataset developed by experienced clinicians who manually labeled individual photographic images, producing a benchmark dataset. Results from our comparative analysis demonstrates the ensemble detection model achieves superior performance compared to the individual models, outperforming the average value of the individual models' map@50 metric by 24% and the value of the map@95-50 metric by 44%<
In the last decades, the effects of global warming combined with growing anthropogenic activities have caused a mismatch in the water supply-demand, resulting in a negative impact on numerous Mediterranean rivers regime and on the functionality of related ecosystem services. Thus, for water management and mitigation of the potential hazards, it is fundamental to efficiently map areal extents of river water surface. Synthetic Aperture Radar (SAR) is one of the satellite technologies applied for hydrological studies, but it has a spatial resolution which is limited for the study of rivers. On the other side, deep learning technology exhibits a high modelling potential with low spatial resolution data. In this paper, a method based on convolutional neural networks is applied to the SAR backscatter coefficient for detecting river water surface. Our experimental study focuses on the lower reach of Mijares river (Eastern Spain), covering a period from Apr 2019 to Sept 2022. Results suggest that radar backscattering has high potential in modelling water river trends, contributing to the monitoring of the effects of climate change and impacts on related ecosystem services. To assess the effectiveness of the method, the output has been validated with the Normalized Difference Water Index (NDWI)
In recent years, electronic payment through Point-of-Sale (POS) systems has become popular. For this reason, POS devices are becoming more targeted by cyber attacks. In particular, RAM scraping malware is the most dangerous threat: the card data is extracted from the process memory, during the transaction and before the encryption, and sent to the attacker. This paper focuses on the possibility to detect this kind of malware through anomaly detection based on Deep Learning with attention, using the network traffic with data exfiltration occurrences. To show the effectiveness of the proposed approach, real POS transaction traffic has been used, together with real malware traffic extracted from a collection of RAM scrapers. Early results show the high potential of the proposed approach, encouraging further comparative research. To foster further development, the data and source code have been publicly released.
Background: This study aims to evaluate the diagnostic performance of Deep Learning (DL) machine for the detection of adenomyosis on uterine ultrasonographic images and compare it to intermediate ultrasound skilled trainees. Methods: Prospective observational study were conducted between 1 and 30 April 2022. Transvaginal ultrasound (TVUS) diagnosis of adenomyosis was investigated by an experienced sonographer on 100 fertile-age patients. Videoclips of the uterine corpus were recorded and sequential ultrasound images were extracted. Intermediate ultrasound- skilled trainees and DL machine were asked to make a diagnosis reviewing uterine images. We evaluated and compared the accuracy, sensitivity, positive predictive value, F1-score, specificity and negative predictive value of the DL model and the trainees for adenomyosis diagnosis. Results: Accuracy of DL and intermediate ultrasound-skilled trainees for the diagnosis of adenomyosis were 0.51 (95% CI, 0.48-0.54) and 0.70 (95% CI, 0.60-0.79), respectively. Sensitivity, specificity and F1-score of DL were 0.43 (95% CI, 0.38-0.48), 0.82 (95% CI, 0.79-0.85) and 0.46 (0.42-0.50), respectively, whereas intermediate ultrasound-skilled trainees had sensitivity of 0.72 (95% CI, 0.52-0.86), specificity of 0.69 (95% CI, 0.58-0.79) and F1-score of 0.55 (95% CI, 0.43-0.66). Conclusions: In this preliminary study DL model showed a lower accuracy but a higher specificity in diagnosing adenomyosis on ultrasonographic images compared to intermediate-skilled trainees
Evaluating and comparing text-to-image models is a challenging problem. Significant advances in the field have recently been made, piquing interest of various industrial sectors. As a consequence, a gold standard in the field should cover a variety of tasks and application contexts. In this paper a novel evaluation approach is experimented, on the basis of: (i) a curated data set, made by high-quality royalty-free image-text pairs, divided into ten categories; (ii) a quantitative metric, the CLIP-score, (iii) a human evaluation task to distinguish, for a given text, the real and the generated images. The proposed method has been applied to the most recent models, i.e., DALLE2, Latent Diffusion, Stable Diffusion, GLIDE and Craiyon. Early experimental results show that the accuracy of the human judgement is fully coherent with the CLIP-score. The dataset has been made available to the public.
Structural health monitoring of buildings via agnostic approaches is a research challenge. However, due to the recent advent of pervasive multi-sensor systems, historical data samples are still limited. Consequently, data-driven methods are often unfeasible for long-term assessment. Nevertheless, some famous historical buildings have been subject to monitoring for decades, before the development of smart sensors and Deep Learning(DL). This paper presents a DL approach for the agnostic assessment of structural changes. The proposed approach has been experimented to the stabilizing intervention carried out in 2000-2002 on the leaning tower of Pisa (Italy). The data set is made by operational and environmental measures collected from 1993 to 2006. Both conventional and recent approaches are compared: Multiple Linear regression, LSTM and Tansformer. Experimental results are promising, and clearly shows a better change sensitivity of the LSTM, as well as a better modeling accuracy of the Transformer
Structural Health Monitoring (SHM) of civil structures using IoT sensors is a major emerging challenge. SHM aims to detect and identify any deviation from a reference condition, typically a damage-free baseline, to keep track of the relevant structural integrity. Machine Learning (ML) techniques have recently been employed to empower vibration-based SHM systems. Supervised ML tends to achieve better accuracy than unsupervised ML, but it requires human intervention to label data appropriately. However, labelled data related to damage conditions of civil structures are often unavailable. To overcome this limitation, a key solution is a digital twin relying on physics-based numerical models to simulate the structural response in terms of the and vibration recordings provided by IoT devices during the events of interest, such as wind or seismic excitations. This paper presents such comprehensive approach, here framed to address the tasks of damage localization, exploiting a Convolutional Neural Network (CNN). Early experimental results relevant to a pilot application involving a sample structure, show the potential of the proposed approach, as well as the reusability of the trained system in presence of varying loading scenarios.
The computer vision and object detection techniques developed in recent years are dominating the state of the art and are increasingly applied to document layout analysis. In this research work, an automatic method to extract meaningful information from scanned documents is proposed. The method is based on the most recent object detection techniques. Specifically, the state-of-the-art deep learning techniques that are designed to work on images, are adapted to the domain of digital documents. This research focuses on play scripts, a document type that has not been considered in the literature. For this reason, a novel dataset has been annotated, selecting the most common and useful formats from hundreds of available scripts. The main contribution of this paper is to provide a general understanding and a performance study of differentimplementations of object detectors applied to this domain. A fine-tuning of deep neural networks, such as Faster R-CNN and YOLO, has been made to identify text sections of interest via bounding boxes, and to classify them into a specific pre-defined category. Several experiments have been carried out, applying different combinations of data augmentation techniques.
The increasing availability of Satellite technology for Earth observation enables the monitoring of land subsidence, achieving large-scale and long-term situation awareness for supporting various human activities. Nevertheless, even with the most-recent Interferometric Synthetic Aperture Radar (InSAR) technology, one of the main limitations is signal loss of coherence. This paper introduces a novel method and tool for increasing the spatial density of the surface motion samples. The method is based on Transformers, a machine learning architecture with dominant performance, low calibration cost and agnostic method. This paper covers development and experimentation on four-years surface subsidence (2017-2021) occurring in two Italian regions, Emilia-Romagna and Tuscany, due to ground-water over-pumping using Sentinel-1 data processed with P-SBAS (Parallel Small Baseline Subset) time-series analysis. Experimental results clearly show the potential of the approach. The developed system has been publicly released to guarantee its reproducibility and the scientific collaboration.
Mathematics is an effective testbed for measuring the problem-solving ability of machine learning models. The current benchmark for deep learning-based solutions is grade school math problems: given a natural language description of a problem, the task is to analyse the problem, exploit heuristics generated from a very large set of solved examples, and then generate an answer. In this paper, a descendant of the third generation of Generative Pre-trained Transformer Networks (GPT-3) is used to develop a zero-shot learning approach, to solve this problem. The proposed approach shows that coding based problem-solving is more effective than the natural language reasoning based one. Specifically, the architectural solution is built upon OpenAI Codex, a descendant of GPT-3 for programming tasks, trained on public GitHub repositories, the world’s largest source code hosting service. Experimental results clearly show the potential of the approach: by exploiting the Python as programming language, proposed pipeline achieves the 18.63% solve rate against the 6.82% of GPT-3. Finally, by using a fine-tuned verifier, the correctness of the answer can be ranked at runtime, and then improved by generating a predefined number of trials. With this approach, for 10 trials and an ideal verifier, the proposed pipeline achieves 54.20% solve rate
This paper introduces a novel method and tools for groundwater modeling. The purpose is to perform numerical approximations of a groundwater system, for unlocking and paving water management problems and supporting decision-making processes. In the last decade, Data-driven Models (DdMs) have attracted increasing attention for their efficient development made possible by modern remote and ground sensing and learning technologies. With respect to conventional Process-driven Models (PdMs), based on mathematical modeling of core physical processes into a system of equations, a DdM requires less human effort and process-specific knowledge. The paper covers the design and simulation of a deep learning modeling tool based on Convolutional Neural Networks, integrated with the design and simulation of the workflow based on the Business Process Model and Notation (BPMN). Experimental results clearly show the potential of the novel approach for scientists and policy makers.
Conventional neural networks (NNs) for image classification make use of a convolutional layer and a feedforward (FF) classification layer. This paper presents a novel classification layer architecture and a training paradigm, in which the FF layer is split into small and specialized FF nets called Noise Boosted Receptive Fields (NBRFs), one per class. Each i-th NBRF provides three membership degrees: to the i-th class, to the super class made by its complementary classes, and to an extra class representing out-of-classes images. The training process artificially generates extra-class samples, via image transformation and noise addition. Experimental results, carried out on MNIST, KMNIST and FMNIST datasets show that, with respect to an FF layer, the NBRF layer improves robustness and accuracy of classification. The repository with the source code and experimental data has been publicly released to facilitate reproducibility and widespread adoption.
Managing water distribution networks via pumps scheduling programs is a multi-objective optimization problem with dynamic and various site-specific challenges. Metaheuristicsbased approaches, with respect to mathematical solvers, offer data-driven strategies for manageable and adaptive control. Some evolutionary approaches are suitable for multi-criteria decision making and decentralized coordination on programmable logic controllers. This paper focuses on the development of a testbed and an early assessment of an approach based on NSGA-II and Pseudo-Weights. The experimental studies are based on a physically developed case study, and on a scalable case study with realistic water demand and source patterns. The testbed has been publicly released.
In this research work we present CLIP-GLaSS, a novel zero-shot framework to generate an image (or a caption) corresponding to a given caption (or image). CLIP-GLaSS is based on the CLIP neural network, which, given an image and a descriptive caption, provides similar embeddings. Differently, CLIP-GLaSS takes a caption (or an image) as an input, and generates the image (or the caption) whose CLIP embedding is the most similar to the input one. This optimal image (or caption) is produced via a generative network, after an exploration by a genetic algorithm. Promising results are shown, based on the experimentation of the image Generators BigGAN and StyleGAN2, and of the text Generator GPT2.
In this paper we investigate some of the issues that arise from the scalarization of the multi-objective optimization problem in the Advantage Actor Critic (A2C) reinforcement learning algorithm. We show how a naive scalarization leads to gradients overlapping and we also argue that the entropy regularization term just inject uncontrolled noise into the system. We propose two methods: one to avoid gradient overlapping (NOG) but keeping the same loss formulation; and one to avoid the noise injection (TE) but generating action distributions with a desired entropy. A comprehensive pilot experiment has been carried out showing how using our proposed methods speeds up the training of 210%. We argue how the proposed solutions can be applied to all the Advantage based reinforcement learning algorithms.
This paper proposes the Mesh Neural Network (MNN), a novel architecture which allows neurons to be connected in any topology, to efficiently route information. In MNNs, information is propagated between neurons throughout a state transition function. State and error gradients are then directly computed from state updates without backward computation. The MNN architecture and the error propagation schema is formalized and derived in tensor algebra. The proposed computational model can fully supply a gradient descent process, and is suitable for very large scale NNs, due to its expressivity and training efficiency, with respect to NNs based on back-propagation and computational graphs.
In this paper, a novel architecture of Recurrent Neural Network (RNN) is designed and experimented. The proposed RNN adopts a computational memory based on the concept of stigmergy. The basic principle of a Stigmergic Memory (SM) is that the activity of deposit/removal of a quantity in the SM stimulates the next activities of deposit/removal. Accordingly, subsequent SM activities tend to reinforce/weaken each other, generating a coherent coordination between the SM activities and the input temporal stimulus. We show that, in a problem of supervised classification, the SM encodes the temporal input in an emergent representational model, by coordinating the deposit, removal and classification activities. This study lays down a basic framework for the derivation of a SM-RNN. A formal ontology of SM is discussed, and the SM-RNN architecture is detailed. To appreciate the computational power of an SM-RNN, comparative NNs have been selected and trained to solve the MNIST handwritten digits recognition benchmark in its two variants: spatial (sequences of bitmap rows) and temporal (sequences of pen strokes).
A current research trend in neurocomputing involves the design of novel artificial neural networks incorporating the concept of time into their operating model. In this paper, a novel architecture that employs stigmergy is proposed. Computational stigmergy is used to dynamically increase (or decrease) the strength of a connection, or the activation level, of an artificial neuron when stimulated (or released). This study lays down a basic framework for the derivation of a stigmergic NN with a related training algorithm. To show its potential, some pilot experiments have been reported. The XOR problem is solved by using only one single stigmergic neuron with one input and one output. A static NN, a stigmergic NN, a recurrent NN and a long short-term memory NN have been trained to solve the MNIST digits recognition benchmark.
A significant phenomenon in microblogging is that certain occurrences of terms self-produce increasing mentions in the unfolding event. In contrast, other terms manifest a spike for each moment of interest, resulting in a wake-up-and-sleep dynamic. Since spike morphology and background vary widely between events, to detect spikes in microblogs is a challenge. Another way is to detect the spikiness feature rather than spikes. We present an approach which detects and aggregates spikiness contributions by combination of spike patterns, called archetypes. The soft similarity between each archetype and the time series of term occurrences is based on computational stigmergy, a bio-inspired scalar and temporal aggregation of samples. Archetypes are arranged into an architectural module called Stigmergic Receptive Field (SRF). The final spikiness indicator is computed through linear combination of SRFs, whose weights are determined with the Least Square Error minimization on a spikiness training set. The structural parameters of the SRFs are instead determined with the Differential Evolution algorithm, minimizing the error on a training set of archetypal series. Experimental studies have generated a spikiness indicator in a real-world scenario. The indicator has enhanced a cloud representation of social discussion topics, where the more spiky cloud terms are more blurred.
Dense Information Retrieval (DIR) has recently gained attention due to the advances in deep learning-based word embedding. In particular, for historical languages such as Latin, a DIR task is appropriate although challenging, due to: (i) the complexity of managing searches using traditional Natural Language Processing (NLP); (ii) the availability of fewer resources with respect to modern languages; (iii) the large variation in usage among different eras. In this research, pre-trained transformer models are used as features extractors, to carry out a search on a Latin Digital Library. The system computes embeddings of sentences using state-of-the-art models, i.e., Latin BERT and LaBSE, and uses cosine distance to retrieve the most similar sentences. The paper delineates the system development and summarizes an evaluation of its performance using a quantitative metric based on expert’s per-query documents ranking. The proposed design is suitable for other historical languages. Early results show the higher potential of the LabSE model, encouraging further comparative research. To foster further development, the data and source code have been publicly released