Open access peer-reviewed article

Real-Time, Non-Intrusive Fall Detection via Wi-Fi CSI: A Comparative Study of CNN-LSTM, GNN, and Transformer Models

Dit Preechakarnjanadit

Attaphongse Taparugssanagorn

This Article is part of Artificial Intelligence Section

Article metrics overview

531 Article Downloads

View Full Metrics

Article Type: Research Paper

Date of acceptance: November 2025

Date of publication: December 2025

DoI: 10.5772/acrt20250099

Download for free

Table of contents

Introduction

Methodology

Experiments, Results, and Discussion

Conclusion and Future Directions

Data availability statement

Conflict of Interest

Abstract

Falls pose serious risks, particularly for the elderly, and existing monitoring methods like CCTV and wearables face privacy, comfort, and setup challenges. This study introduces a non-intrusive fall detection system using Wi-Fi Channel State Information (CSI) for Human Activity Recognition (HAR). The dataset was collected from two participants across multiple sessions and environments, comprising 700 falling events and 2,700 normal activity events (lying down, staying still, transitioning, and walking). We evaluate Convolutional Neural Network–Long Short-Term Memory (CNN-LSTM), Graph Neural Networks (GNN), and Transformer models on CSI data, with and without Principal Component Analysis (PCA). CNN-LSTM achieves the highest accuracy (94.85% with PCA) and performs robustly in real-time. It is validated across five activities—falling, lying down, staying still, transitioning, and walking—under both Line-of-Sight and Non-Line-of-Sight conditions, including an extreme case with the router and device in separate rooms. Four-class accuracy reaches 95.5%, with PCA helping reduce confusion between similar postures. The system uses a sliding window, real-time tcpdump capture, and a multithreaded Python pipeline, achieving average latency of 2.3 seconds. To evaluate model reliability under limited resources, the models were trained on data from one participant and tested on the other two participants, providing a proof-of-concept for generalization to unseen subjects. This low-cost, privacy-preserving solution suits smart rehabilitation and elderly care.

Keywords

channel state information
convolutional neural network
graph neural network
principal component analysis
transformer model

Author information

Introduction

Falls among the elderly pose a significant public health concern worldwide, contributing to major morbidity and mortality. Addressing this issue urgently is imperative, as falls often lead to acute physical deterioration, turning people capable of self-care into bedridden patients. Approximately one-third of people 65 years and older experience falls annually, resulting in injuries such as fractures, head injuries, and, in some cases, fatalities [1]. With global populations rapidly aging, the social and economic burden associated with fall-related injuries is expected to increase substantially. This reality highlights the urgent need for robust, accurate, and non-intrusive fall detection solutions that can be deployed at scale to support independent living and reduce healthcare costs.

Various fall detection technologies have emerged, broadly categorized as wearable sensors, camera-based systems, and Wi-Fi-based systems. Wearable sensors offer versatility but are inconvenient, as they require constant device carrying, which is particularly challenging for forgetful individuals [2, 3]. Camera-based systems, while effective, raise privacy concerns and may not be deployable in certain spaces like bathrooms [4, 5]. Wi-Fi-based systems, which leverage ambient radio signals for human activity recognition, present a compelling alternative due to their unobtrusiveness and potential for widespread indoor deployment without requiring user compliance [6]. In contrast to wearables, Wi-Fi sensing does not depend on user adherence or comfort, and unlike vision-based approaches, it avoids issues of privacy and limited visibility. These advantages make Wi-Fi CSI particularly attractive for continuous and unobtrusive monitoring in smart homes and healthcare environments.

This study focuses on a Wi-Fi-based fall detection system, addressing the limitations of traditional Received Signal Strength-based methods through the use of Channel State Information (CSI), which captures fine-grained signal characteristics across multiple subcarriers [6–8]. While wearable sensor systems demonstrate high detection accuracy (up to 98%) [9, 10], their practicality is limited. Similarly, camera-based systems can achieve 96% accuracy [11], but they are hindered by privacy and visibility issues. In contrast, CSI-based systems provide a balance of accuracy and user comfort, although they still face challenges in terms of robustness and generalization across diverse indoor environments [6–8].

Several recent studies have investigated CSI-based Human Activity Recognition (HAR) using diverse deep learning architectures. Lowe et al. (2022) implemented a Convolutional Long Short-Term Memory (LSTM) model on a Raspberry Pi 3B+ using CSI amplitude data, achieving 95% accuracy. However, the model was trained on a limited dataset (720 samples across six activity classes), lacked preprocessing, and struggled with real-time performance [12]. Similarly, Zhuravchak et al. (2021) employed InceptionTime and Bidirectional LSTM across three environments, attaining only 61% accuracy, underscoring the impact of small datasets and the absence of real-time evaluation [13].

Moshiri et al. [14] compared multiple deep models, including 1D-Convolutional Neural Network (CNN), 2D-CNN, LSTM, and BiLSTM with attention, on a Raspberry Pi 4, achieving up to 95% accuracy. However, their dataset contained only 420 samples, and the lack of preprocessing and online deployment limited model generalizability [14]. Likewise, Forbes et al. [15] used DeepConvLSTM in a controlled setting on Raspberry Pi 4, reaching 92% accuracy with 1,100 samples across 11 classes, but faced similar constraints in terms of dataset diversity and real-time applicability [15].

Yan et al. [16] adopted an Extreme Learning Machine trained on data from an Intel 5,300 Network Interface Card (NIC) across two environments, achieving 94.2% accuracy from 4,400 samples. Despite the relatively larger dataset, the system lacked real-time recognition capability, limiting its practical deployment [16].

1.1.

Research Gaps

A comprehensive analysis of previous studies in CSI-based HAR reveals several persistent limitations that hinder their scalability and real-world applicability. One major concern is the prevalent use of small, non-representative datasets, which heightens the risk of overfitting and undermines the models’ ability to generalize across diverse home layouts, occupant behaviors, and environmental conditions. For example, Lowe et al. [12], Moshiri et al. [14], and Forbes et al. [15] relied on datasets ranging from 420 to 1,100 samples, which limited the robustness of their models. Zhuravchak et al. [13] demonstrated that even state-of-the-art architectures like InceptionTime could only achieve limited accuracy when trained on small-scale data. Recently, Hu et al. [17] introduced a mobile Wi-Fi-CSI receiver system for construction workers, yet their dataset was limited to a small-scale simulated site, raising questions about scalability and generalizability. Youm and Go [18] explored lightweight Neural Architecture Search (NAS)-pruned models, but real-time performance in dynamic, uncontrolled environments remains untested. Bayad et al. [19] proposed a 2D-CNN system that generalizes to unseen environments and new participants, but their evaluation was still limited to a constrained indoor dataset with specific environmental variations, highlighting the need for broader, multi-site validation.

This issue is further compounded by the lack of evaluation in heterogeneous spatial settings, which ignores the variations introduced by different building materials, furniture arrangements, and human-induced dynamics, all of which affect the propagation of the Wi-Fi signal. Studies such as Yan et al. [16], Forbes et al. [15], Hu et al. [17], Youm and Go [18], and Bayad et al. [19] validated their systems in only one or two environments, limiting the generalizability of their models across different deployment scenarios.

Another critical gap lies in the insufficient preprocessing of raw CSI data. Many studies, including those by Moshiri et al. [14] and Lowe et al. [12], directly utilize unfiltered amplitude or phase information, which is often noisy and susceptible to phase shifts, multipath effects, and hardware-related artifacts. Without proper cleaning and transformation of the signal, these inconsistencies degrade the performance of the model and contribute to poor reproducibility. Even recent mobile receiver, NAS-pruned, and adaptive 2D-CNN models [17–19] often rely on preprocessed datasets or offline analysis, limiting insights into end-to-end real-time deployment.

Equally significant is the gap in real-time system deployment. Despite the growing need for continuous and passive monitoring, especially in applications such as eldercare, most proposed systems are validated only under controlled lab conditions, with no integration into real-time pipelines. For example, although Yan et al. [16] used a relatively large dataset, the lack of real-time classification limited the system’s utility for real-world applications. Similar limitations in online processing were observed in the works of Zhuravchak et al. [13], Forbes et al. [15], and Moshiri et al. [14]. Recent studies [20, 21] have advanced representation learning and cross-domain adaptation in HAR. However, significant research gaps remain: real-time HAR under robust Wi-Fi CSI preprocessing, end-to-end deployment in dynamic or unseen environments, and integration of large-scale self-supervised pretraining for practical scenarios are still largely unsolved problems. Youm and Go [18] improved efficiency via NAS-pruning, but real-world tests were not conducted. Bayad et al. [19] demonstrated adaptability to unseen environments and individuals but primarily in small-scale indoor settings, leaving large-scale deployment unexamined.

In summary, existing studies frequently suffer from: (1) limited and environment-specific datasets [12, 14, 15, 17–21]; (2) inadequate preprocessing of CSI signals [12, 14, 17–19]; (3) absence of real-time activity recognition [13, 16–21]; and (4) narrow experimental validation in realistic settings [15–19]. These gaps point to the urgent need for holistic solutions that combine diverse data collection, rigorous signal preprocessing, and real-time deep learning pipelines to ensure robust and scalable performance.

Our study directly addresses these challenges by incorporating a more diverse and realistic dataset, applying tailored preprocessing techniques to refine raw CSI data, and developing an end-to-end deep learning model capable of real-time activity classification. Compared to prior work—including mobile receiver systems [13], NAS-pruned lightweight models [18], and adaptive 2D-CNNs for unseen environments [19]—our approach integrates real-time deployment, robust preprocessing, and evaluation across multiple environments, demonstrating both practical applicability and improved generalizability in Wi-Fi-based HAR. This approach aims to bridge the gap between laboratory-based experimentation and real-world deployment, ultimately improving the reliability and practicality of Wi-Fi-based HAR systems.

1.2.

Our Contribution

This research presents a practical and robust approach to real-time fall detection using Wi-Fi-based HAR, addressing limitations in prior studies [12, 15, 18–22] that often rely on offline analysis, preprocessed datasets, simulations, or expensive hardware. Unlike these works, our system directly processes raw CSI in realistic indoor settings, achieves end-to-end real-time performance, and is validated on low-cost, widely available hardware. Each design choice emphasizes practicality, robustness, and reproducibility for real-world deployment.

Low-cost, real-time HAR system: uses only an off-the-shelf Wi-Fi router and Raspberry Pi 4, eliminating the need for expensive sensors or specialized hardware. Previous studies [17, 19] often require multiple antennas, mobile devices, or costly experimental setups, limiting accessibility for everyday applications.
End-to-end solution: integrates data collection, CSI preprocessing, model training, and deployment in one seamless pipeline. Many works [12, 21, 22] rely on offline datasets or simulations, making them unsuitable for real-time monitoring.
Beacon-frame CSI acquisition: uses only passive Wi-Fi beacon frames to generate CSI, avoiding interference with existing network traffic. Other approaches often require active transmissions or additional devices [15, 18], which increases complexity and deployment cost.
Robust dataset and preprocessing workflow: captures both Line-of-Sight (LOS) and Non-Line-of-Sight (NLOS) scenarios and uses subcarrier selection, Hampel and Savitzky–Golay (SG) filtering, and normalization to improve signal quality. Prior studies [20, 21] often rely on preprocessed or idealized datasets, limiting generalization to realistic indoor environments.
Validated lightweight deep learning architectures (CNN-LSTM, 2D-CNN): CNN-LSTM learns spatial CSI patterns and temporal dependencies simultaneously, which is essential for complex activity recognition. Transformer and GNN approaches [18, 19] are often unverified on real CSI data, raising doubts about practical utility.
Sliding window mechanism: overlapping segments improve prediction smoothness and reduce errors during activity transitions. Most related works do not explicitly handle temporal overlap, which can lead to misclassification at activity boundaries [6, 23].
Feature enhancement with PCA: reduces dimensionality and improves feature clarity, particularly for activities with similar CSI patterns (e.g., falling vs. lying). Many previous methods ignore architecture-specific preprocessing [13, 17], leading to lower accuracy under NLOS conditions.
Continuous real-time pipeline: automates packet capture, conversion, preprocessing, and classification with Python threading and watchdog tools to maintain low-latency operation. Other approaches typically focus only on model evaluation offline, without demonstrating full real-time deployment [12, 20].
Foundation for practical applications: the validated, low-cost, and end-to-end tested system enables deployment in smart homes, healthcare, and assisted living environments. Prior works provide algorithmic or simulation results [21, 22], but rarely demonstrate practical applicability in realistic indoor settings.

Methodology

This section presents the methodology employed in developing a real-time HAR system utilizing Wi-Fi CSI data. The section begins with a concise mathematical description of CSI and its relationship to the wireless channel and human activity. This is followed by an overview of the overall workflow, the hardware and software configurations used, and the data acquisition process. Subsequently, preprocessing techniques applied to the raw data are discussed, followed by the classification process, including model training strategies and performance evaluation. The section concludes with the design and implementation of the real-time HAR system, enabling continuous activity detection.

Figure 1 illustrates the complete workflow of this study, detailing key phases in developing the real-time HAR system. The workflow encompasses initial research, planning, data acquisition, preprocessing, model construction, and real-time deployment. Each phase builds on the previous one, ensuring a systematic and reproducible approach.

Figure 1.
Complete workflow of the study, detailing the sequential phases from initial research and planning to data acquisition, preprocessing, model development, and real-time deployment.

2.1.

Mathematical Formulation of CSI

CSI captures the frequency response of a wireless channel, describing how transmitted signals are affected by multipath propagation and environmental changes, including human movements. Let the transmitted signal be , the received signal can be expressed as

(1)

where is the channel impulse response (CIR), denotes convolution, and is additive noise. The CIR can be further represented in discrete form across subcarriers as

(2)

where and denote the amplitude and phase of the -th subcarrier, respectively. CSI captures these parameters, reflecting the effects of multipath propagation, attenuation, and changes in the environment, such as human motion. By monitoring variations in CSI over time, human activities can be inferred in a non-intrusive manner, forming the basis of Wi-Fi-based HAR systems.

2.2.

System Architecture and CSI Acquisition Setup

We develop a practical and cost-effective Wi-Fi-based HAR system using a Raspberry Pi 4, a standard Wi-Fi router (TP-Link Archer AX23), and a laptop. This setup balances affordability, portability, ease of deployment, and sufficient processing capability, making it suitable for real-world applications.

The Raspberry Pi acts as the CSI receiver. Its built-in Broadcom Wi-Fi chip (BCM43455C0), compact design, and General-Purpose Input/Output (GPIO) support make it ideal for edge sensing in constrained environments. Since the default operating system lacks native CSI access, we install DragonOS Pi64 (kernel 5.4.0-1033-raspi), a Linux distribution tailored for signal processing, and integrate Nexmon, an open-source firmware patch that enables CSI extraction by modifying Wi-Fi drivers. Nexmon also allows beacon frame filtering (frame control ), and Media Access Control (MAC)-based capture, ensuring only relevant, high-quality data is collected.

The TP-Link router is configured to operate on the 5 GHz band, channel 36, with an 80 MHz bandwidth, broadcasting beacon frames at a fixed 50 ms interval (~20 Frames Per Second (FPS)). This setup ensures regular, passive CSI sampling without generating additional traffic, minimizing interference and simplifying deployment.

Each CSI segment comprises 100 consecutive beacon frames (~5 seconds), capturing both transient and steady-state features of human activities. This window length balances responsiveness and pattern richness for classification.

Data are offloaded to a laptop for processing and model training, leveraging its superior computational power. This edge-backend separation improves real-time performance, reduces latency, and allows the use of more complex models without overloading the embedded hardware.

Overall, the hybrid architecture—lightweight at the edge, powerful at the backend—provides a scalable and flexible solution for CSI-based HAR, aligning with the growing need for affordable and deployable sensing systems in smart home and healthcare environments.

2.3.

Data Preprocessing

To ensure the reliability and effectiveness of machine learning models trained on raw CSI data acquired from Raspberry Pi devices, rigorous preprocessing is essential. Raw CSI measurements are inherently noisy and may contain irrelevant or redundant information that can negatively impact classification accuracy. Through a structured preprocessing pipeline, the CSI data can be transformed into a stable and informative representation, thereby enhancing the robustness and predictive performance of subsequent models.

2.3.1.

Removal of Null and Pilot Subcarriers

According to the IEEE 802.11ac standard, an 80 MHz channel comprises 256 subcarriers, of which 234 are allocated for data transmission, while the remaining 22 are designated as non-data subcarriers—14 as null or guard bands (including the Direct Current (DC) subcarrier) and 8 as pilot subcarriers. As summarized in Table 1, the pilot subcarriers are positioned at specific indices to aid synchronization and channel estimation, whereas null subcarriers help minimize interference between adjacent channels and preserve orthogonality.

Bandwidth	No. of Subcarriers	Transmitting Subcarrier Indices
80 MHz	256	−122 to −2 and 2 to 122

Table 1.

Subcarrier allocation in IEEE 802.11ac (80 mhz bandwidth) [24].

Note: Pilot subcarriers at indices −103, −75, −39, −11, 11, 39, 75, 103, and DC subcarriers at −1, 0, and 1 are excluded

Although these subcarriers serve important functions during signal transmission, they often carry arbitrary or non-informative values in the captured CSI dataset. Consequently, retaining them during preprocessing phase may introduce noise or misleading patterns that hinder model learning. Therefore, a standard preprocessing step is to exclude these non-data subcarriers to ensure the input features are both meaningful and representative of the true communication channel dynamics.

2.3.2.

Elimination of Low-Energy Subcarriers

While the 802.11ac protocol enables wideband (80 MHz) operation, beacon frames—typically used for channel probing—are transmitted over the primary 20 MHz sub-band. Nonetheless, the CSI collection process captures all 256 subcarriers, including those beyond the primary transmission band. The majority of these additional subcarriers exhibit negligible signal energy and are not actively used for data transmission during beacon frame emission.

Including such low-value or void subcarriers in the dataset can obscure meaningful signal structures, add unnecessary dimensionality, and degrade feature extraction quality. Therefore, only the subcarriers corresponding to the actual 20 MHz transmission band are retained to reduce noise and maintain data relevance.

2.3.3.

Outlier Detection and Removal

Outliers in time-series CSI data may stem from hardware anomalies, environmental disturbances, or random fluctuations during signal measurement. To address this, we employ the Hampel filter, a robust technique for outlier detection that leverages median-based statistics rather than the mean and standard deviation, which are susceptible to distortion by extreme values. The Hampel filter is particularly well suited for CSI signals because it can effectively suppress short-term bursts and non-Gaussian noise while preserving the underlying temporal structure of the data.

Given a sequence , a sliding window of length is centered around each . Within this window, the local median is computed as

(3)

The Median Absolute Deviation (MAD) within the window is then defined as

(4)

A standardized score is calculated for each sample as

(5)

where makes the MAD a consistent estimator of the standard deviation for normally distributed data. Values for which (typically ) are flagged as outliers and replaced with the local median . This filtering preserves signal continuity while mitigating the influence of anomalous spikes.

2.3.4.

Noise Reduction and Signal Smoothing

To further enhance signal quality, the SG filter is employed for smoothing. Unlike simple moving averages, the SG filter performs a local polynomial regression across a sliding window to fit the data more accurately while preserving important features such as peaks and curvature.

For a window of size , the smoothed value at index , denoted , is obtained via

(6)

where is precomputed filter coefficients derived from a least-squares polynomial fit. By adjusting the polynomial degree and window size, the filter achieves an optimal balance between noise suppression and fidelity to original signal structures. This is particularly useful for preserving the dynamics of gesture-induced variations in the CSI signal.

2.3.5.

Data Normalization

Normalization is a crucial step that ensures uniformity in feature scaling. Without normalization, features with larger numeric ranges may disproportionately influence model training. To prevent such bias and to accelerate convergence during optimization, we apply min-max normalization, transforming each value according to

(7)

This transformation maps all feature values to the interval, improving model stability and comparability across features.

2.3.6.

Dimensionality Reduction via Principal Component Analysis

High-dimensional CSI data can be computationally intensive and may include redundant or highly correlated features. To mitigate this, Principal Component Analysis (PCA) is applied to extract the most informative components while reducing dimensionality.

Let represent the mean-centered dataset. The covariance matrix is Calculated as

(8)

PCA solves the eigenvalue problem as

(9)

where and are the eigenvalues and corresponding eigenvectors. The eigenvectors define new axes (principal components) along which the data exhibit maximum variance. The top eigenvectors form a projection matrix , which is used to transform the original dataset as

(10)

yielding a lower-dimensional representation . This step enhances model efficiency by retaining the most salient features while discarding noise and redundancy [25].

2.4.

Classification

Classifying human activities from CSI data is a vital part of building a real-time HAR system. After collecting and preprocessing the data, we explore different deep learning models to identify the best trade-off between accuracy and real-time performance for practical use. We focus on the following three architectures: CNN-LSTM, Graph Neural Networks (GNN), and Transformers, because each offers unique strengths tailored to the complex spatio-temporal nature of Wi-Fi CSI signals.

CNN-LSTM models combine two powerful components: convolutional layers and recurrent units. The convolutional layers act as spatial feature extractors, scanning the CSI matrix to detect local patterns across subcarriers, which are crucial for identifying characteristic signal changes caused by different human activities. The LSTM layers model the temporal dependencies by remembering sequences of extracted features over time, capturing how activities unfold dynamically. This combination makes CNN-LSTM highly suitable for our study, as it balances spatial detail with temporal context, which is essential for recognizing complex movements such as falls or transitions between actions.

By contrast, GNNs treat CSI data as a graph structure, where nodes represent different signal points and edges reflect their relationships. This framework allows GNNs to learn spatial dependencies in a flexible non-Euclidean domain, which is especially useful for capturing subtle interactions in multipath and cluttered environments typical of indoor Wi-Fi signals. The message-passing and aggregation mechanisms of GNNs help the model adapt to complex topologies and spatial correlations that traditional CNNs might miss, making them a promising choice for nuanced activity patterns in realistic settings.

Transformers rely on a self-attention mechanism that computes weighted dependencies between all elements in a sequence simultaneously. This architecture excels at modeling long-range temporal relationships without the limitations of sequential memory in RNNs or LSTMs. Transformers’ multi-head attention enables the model to focus on different parts of the CSI sequence in parallel, efficiently extracting relevant features for activity classification. Their ability to handle high-dimensional, sequential data with global context makes Transformers an attractive candidate for HAR, especially as activity signals often involve long and complex temporal patterns.

By studying these models, we aim to benchmark their classification accuracy and computational efficiency, identifying the architecture that best fits real-time HAR with Wi-Fi CSI. This thorough comparison highlights how each model’s internal mechanisms align with the unique challenges of CSI data—spatial complexity, temporal dynamics, and noise robustness—and help us select the most practical and effective approach for deployment in real-world scenarios.

2.4.1.

Model Training Process

The model training process begins with the collected CSI dataset, organized into individual Comma-Separated Values (CSV) files, each representing a specific activity class. The dataset is carefully split into training and validation subsets, with an 80:20 ratio to ensure that the model’s performance is accurately assessed on previously unseen data. This partitioning helps mitigate the risk of overfitting and ensures that the model generalizes well to new instances beyond training data. Hyperparameter optimization plays a key role in achieving an optimal balance between model learning efficiency and its generalization capabilities. Parameters such as learning rate, batch size, and number of layers are fine-tuned to achieve the best possible results while avoiding excessive complexity that may lead to overfitting.

To further improve the model’s generalization, dropout regularization is employed. This technique randomly deactivates a subset of neurons during training, preventing the model from becoming overly reliant on any particular set of features and ensuring adaptability to new data. Additionally, early stopping is implemented to halt the training process when the validation loss ceases to improve. This method prevents unnecessary computational overhead and preserves the model’s ability to adapt to new data, ensuring that it does not excessively tune itself to the training dataset.

This training methodology ensures that the model does not just perform well on the training set but also holds the potential to generalize to previously unseen data, a critical aspect for deployment in real-world scenarios.

2.4.2.

Performance Evaluation

A comprehensive performance evaluation framework is employed to measure the efficacy of the trained models. Both quantitative and qualitative metrics are used to assess the model’s performance. Quantitative metrics include accuracy, precision, recall, and F1 score, all derived from the confusion matrix. These metrics provide a well-rounded view of the model’s ability to correctly identify activities, its sensitivity to detecting different classes, and its ability to balance between precision and recall.

Accuracy is often the first metric of interest, but in activity recognition tasks, precision and recall are equally essential, particularly in scenarios where certain activities may be underrepresented in the dataset. The F1 score, the harmonic mean of precision and recall, serves as a balanced measure to evaluate overall performance, particularly in imbalanced datasets where one activity may be more prevalent than others.

In addition to these quantitative metrics, qualitative evaluation is conducted by examining the loss and accuracy curves over training epochs. These curves offer valuable insights into the model’s learning process, helping to identify signs of overfitting or underfitting. If training accuracy significantly surpasses validation accuracy, overfitting may be occurring, whereas the opposite trend may indicate underfitting. By analyzing these trends, adjustments can be made to improve the model’s robustness and generalization capability.

Together, these evaluation methods ensure that the model not only performs well during the training phase but also retains its effectiveness on unseen data, ensuring its viability for real-time HAR applications.

Experiments, Results, and Discussion

This section presents the experimental framework designed to evaluate the performance of the proposed HAR system. It begins with a description of the equipment setup and experimental configuration, followed by an explanation of the data collection process. Next, the preprocessing steps applied to the raw CSI data are discussed to ensure consistency and quality. The section proceeds by outlining the training procedures and performance evaluation of several models, including CNN-LSTM, GNNs, and Transformer models, both with and without PCA-transformed data. The section concludes with a comparative summary of the model performance and the implementation and validation of the real-time HAR system.

3.1.

Equipment Setup

The implementation of the HAR system required a carefully designed setup on both the transmitter and receiver sides to enable the acquisition of high-quality CSI data. On the transmitter side, a Wi-Fi router was configured to operate in the 5 GHz frequency band, specifically using channel 36 with an 80 MHz bandwidth. The Beacon Interval was set to 50 milliseconds to ensure consistent and periodic transmission of control frames.

On the receiver side, a Raspberry Pi 4 was employed and installed with the Nexmon firmware to enable low-level access for monitoring Wi-Fi signals and capturing CSI data. The setup began by generating a CSI parameter string using the following command:

makecsiparams -c 36/80 -C 1 -N 1 -m {router MAC address} -b 0 × 80

This command configured the system to collect CSI data on channel 36 with 80 MHz bandwidth, utilizing the first processing core and spatial stream of the Raspberry Pi. It also ensured the capture of Beacon frames by specifying the identifier and filtering packets based on the router’s MAC address.

To ensure that the system operated exclusively in monitor mode and remained unaffected by standard client-side processes, the wpa_supplicant service—responsible for managing Wi-Fi connections—was terminated with the following command:

pkill wpa_supplicant

Subsequently, the wireless interface was reactivated to confirm its operational state:

ifconfig wlan0 up

Next, the Nexutil tool was configured to initiate CSI extraction using the generated parameter string:

nexutil -Iwlan0 -s500 -b -l34 –v{parameters from makecsiparams}

To facilitate passive Wi-Fi monitoring, a dedicated monitor interface was created with the following commands:

iw phy $(iw dev wlan0 info | gawk’/wiphy/ {printf“phy”$2}’)

interface add mon0 type monitor

ifconfig mon0 up

This complete configuration enabled the Raspberry Pi to continuously monitor the wireless medium in passive mode, thereby allowing uninterrupted and high-fidelity collection of CSI data without interference from active client communication.

3.2.

Experimental Setup

The CSI dataset used in this study was collected in two distinct environments designed to simulate various real-world conditions for HAR. Figure 2 provides the floor plans of these environments, showing the layout of the transmitter, receiver, and the paths along which human activities were performed. The selection of these environments was intentional, aiming to capture a wide range of activity types in different spatial configurations. By employing distinct environments, the dataset represents diverse scenarios and challenging the HAR system to generalize well across varying environmental conditions.

Figure 2.
Floor plans of the two experimental environments, showing the placement of the Wi-Fi transmitter and Raspberry Pi receiver. The setups include same-room and different-room configurations with both LOS and NLOS conditions, designed to capture diverse human activities and evaluate the HAR system’s robustness across varying spatial and signal conditions.

Two environmental setups were used in this study: (1) the same-room setup and (2) the different-room setup. In the first setup, the environments depicted in Figure 2 were selected to capture the nuances of human activity in both controlled and dynamic conditions. This configuration places both the Wi-Fi router and the Raspberry Pi device in the same room. Within this setup, we conducted two distinct scenarios: one with a clear LOS between the devices, and another with an NLOS condition introduced by placing obstacles between them. The setup also considers factors such as signal strength, potential interference, and the presence of obstacles—all of which can significantly impact the quality and consistency of the collected CSI data. These elements are essential for evaluating the robustness of the HAR system and its ability to classify activities accurately in real time.

Figure 3 illustrates the floor plan of the second setup used for data collection. In this configuration, the Wi-Fi router was placed in one room, while the Raspberry Pi device responsible for collecting CSI was located in a separate room where the test subject performed various activities. This arrangement represents an extreme NLOS scenario, designed to emulate realistic situations commonly found in daily life, for example, when a Wi-Fi router is installed in a living room or bedroom and the user is in another area such as a bathroom. Such a setup allows for evaluating the system’s capability to detect human presence or gestures even when the LOS is completely obstructed, making it highly applicable for non-intrusive activity monitoring in smart home environments.

Figure 3.
Floor plan of the second experimental setup, showing the Wi-Fi router in one room and the Raspberry Pi receiver in a separate room. This extreme NLOS configuration simulates realistic daily-life scenarios, enabling evaluation of the HAR system’s ability to detect human activities when the line of sight is completely obstructed.

3.3.

Data Collection

We continuously captured CSI data in batches of 100 frames using a Python script that repeatedly executed the following command:

tcpdump -i wlan0 dst port 5500 -c 100

The collected CSI data was represented as complex numbers. With an 80 MHz bandwidth, each CSI frame consisted of 256 subcarriers. The number of CSI samples collected for each activity class is shown in Table 2.

Activity	Number of Samples
Falling	700
Lying down	300
Still	900
Transitioning	500
Walking	1,000

Table 2.

Number of CSI samples for each activity class.

3.3.1.

Sliding Window Technique

During standard CSI capture, the signal patterns at the edges of the capture window may be partially or entirely lost. To address this, we implemented a sliding window technique. Each capture cycle consisted of three scheduled CSI captures: the first began immediately, the second started one second later, and the third started 4 seconds after the first. After each cycle, the system briefly paused before starting the next iteration.

This setup ensured consistent overlapping between captures, with new capture windows initiated at fixed 1- and 4-second intervals regardless of the status of previous captures. By overlapping segments, we were able to preserve important transitional patterns while also increasing the effective number of training samples.

3.3.2.

Preventing Overfitting to Activity Duration

A common risk in HAR is that models may learn to discriminate activities based on their duration rather than on intrinsic motion-related features. For example, “walking” and “falling” may differ in how long they typically last, and without careful design, a model could exploit this spurious cue rather than focusing on the true signal dynamics.

To mitigate this, we adopted the following two strategies:

Fixed-size windows: Each CSI segment contained exactly 100 frames (about 5 seconds). This enforced a consistent temporal length across all activity samples, preventing the model from relying on raw activity duration. Instead, the model must focus on spatial-temporal variations within the CSI data itself.
Normalization: Prior to model input, CSI amplitudes were normalized across each window. This reduces the influence of scale variations (e.g., due to environment, user body size, or subtle motion length differences) and encourages the model to capture shape- and trend-related features rather than duration artifacts.

By combining sliding-window segmentation with fixed-length normalization, the model is encouraged to learn subject- and duration-invariant CSI features. This ensures that the classification decision is based on patterns of multipath variations caused by human activity, rather than activity length. Such design choices are especially critical for fall detection, where accurate discrimination between fast events (falling) and visually similar but slower ones (lying down) is essential for real-world deployment.

3.4.

Data Preprocessing

The preprocessing of the raw CSI data involved several stages of filtering and transformation, as illustrated in Figure 4. Initially, we removed null, pilot, and low-value subcarriers to retain only informative parts of the signal. Next, we applied the Hampel filter with a Gaussian scaling factor of 1.4826, a window length of five samples, and a threshold of three standard deviations. This process helped to eliminate outliers while preserving the signal structure. The result is shown in Figure 5. Subsequently, we applied the SG filter with a window length of 7 and a polynomial order of 3 to smooth the signal. The smoothed data are shown in Figure 6. The smoothed data were then normalized to ensure a consistent scale across all subcarriers. The normalized output is depicted in Figure 7. Finally, we applied PCA to reduce dimensionality by selecting the top 10 principal components. The PCA-transformed data are visualized in Figure 8.

Figure 4.
Heatmap of CSI data after initial preprocessing, including the removal of null, pilot, and low-value subcarriers, retaining only the informative components of the signal.

Figure 5.
Heatmap of CSI data after outlier removal using the Hampel filter (window length = 5, threshold = 3σ, Gaussian scaling factor = 1.4826), showing the cleaned signal while preserving the underlying structure.

Figure 6.
Heatmap of CSI data after smoothing with the Savitzky–Golay filter (window length = 7, polynomial order = 3), showing a cleaner signal while preserving key features.

Figure 7.
Heatmap of CSI data after normalization, showing all subcarriers scaled consistently following smoothing.

Figure 8.
Heatmap of CSI data after PCA transformation, showing the reduced-dimensionality representation using the top 10 principal components.

3.5.

Model Training and Performance Evaluation

Following data preprocessing, we proceeded to the model training phase, evaluating three machine learning architectures: CNN-LSTM, GNN, and Transformer. Each model was trained with hyperparameter tuning to optimize its performance and minimize overfitting.

To assess and compare performance, we tracked accuracy and loss over training epochs and generated comprehensive classification reports. These reports included accuracy, precision, recall, and F1-score for each activity class. Furthermore, we visualized confusion matrices to gain deeper insight into misclassification patterns and class-specific challenges.

3.6.

CNN-LSTM Training Procedure

The CNN-LSTM training pipeline starts by organizing the dataset into folders by activity classes: falling, lying down, staying still, transitioning, and walking. Each folder contains CSV files representing individual activity instances, from which CSI values are extracted. Label encoding converts categorical activity names into numeric labels, a necessary step for neural network classification. The dataset is then stratified and split into 80% training and 20% testing sets, ensuring balanced representation of each activity.

Before training, the data are reshaped to fit the CNN-LSTM model by treating each sample as a sequence of CSI frames, with dimensions (sequence length, features, width, and channels). The CNN layers process each frame individually to extract spatial features, and then the LSTM layers analyze these features over time to capture temporal dependencies. This combination enables the model to learn both the detailed spatial characteristics of each CSI snapshot and the evolving patterns across the activity sequence, which is essential for accurately recognizing human movements. The detailed CNN-LSTM model architecture and training parameters are summarized in Table 3.

Parameter	Value
Input shape	(100, 53, 1, 1)
Conv2D - 1st Layer	Filters = 32, kernel size = (3, 1), activation = ReLU
MaxPooling2D - 1st Layer	Pool size = (2, 1)
Conv2D - 2nd Layer	Filters = 64, kernel size = (3, 1), activation = ReLU
MaxPooling2D - 2nd Layer	Pool size = (2, 1)
Flatten	After MaxPooling2D
LSTM - 1st layer	128 units, return sequences = True
LSTM - 2nd layer	128 units
Dropout	0.5
Dense	64 units, activation = ReLU
Loss function	Sparse categorical crossentropy
Optimizer	Adam
Learning rate	0.0005
Early stopping	Monitor = val_loss, patience = 10
Epochs	150
Batch size	16
Train split	80%
Test split	20%

Table 3.

CNN-LSTM model architecture and training parameters.

Training uses the Adam optimizer with a learning rate of 0.0005, optimized with sparse categorical cross-entropy loss since the targets are integer labels. Early stopping halts training if validation loss does not improve for 10 epochs, preventing overfitting. The model trains for up to 150 epochs with a batch size of 16, balancing computational efficiency with performance. This design ensures robust learning of complex activity patterns in real-time scenarios.

3.7.

Performance Evaluation of CNN-LSTM

The model exhibits a strong downward trend in training and validation losses, as illustrated in Figure 9, and reaches a validation accuracy of 92.65%, indicating strong generalization capability.

Figure 9.
Training and validation loss and accuracy curves of the CNN-LSTM model, showing a strong downward trend in losses and a final validation accuracy of 92.65%, indicating effective learning and good generalization.

Table 4 presents the classification performance across each class. High precision and recall are achieved overall, with minor confusion noted for the lying down class. The normalized confusion matrix in Figure 10 shows occasional confusion between lying down and falling and between transitioning and walking.

Class	Precision	Recall	F1-score
Falling	0.95	0.95	0.95
Lying down	0.74	0.92	0.82
Still	0.98	0.91	0.94
Transitioning	0.93	0.87	0.90
Walking	0.93	0.96	0.95
Accuracy			0.93

Table 4.

Classification report for CNN-LSTM.

Figure 10.
Normalized confusion matrix of the CNN-LSTM model, highlighting occasional misclassifications between lying down and falling, as well as between transitioning and walking.

3.8.

Performance Evaluation of CNN-LSTM on PCA-Transformed Data

When training on PCA-transformed CSI data, the CNN-LSTM model achieves even higher test accuracy of 94.85%, with stable convergence as shown in Figure 11. This suggests that PCA effectively reduces dimensionality while retaining meaningful variance.

Figure 11.
Training and validation loss and accuracy curves of the CNN-LSTM model on PCA-transformed CSI data, showing stable convergence and a test accuracy of 94.85%, demonstrating effective dimensionality reduction while preserving key features.

The classification report in Table 5 shows perfect F1-score for transitioning and excellent performance for other classes. However, lying down remains the most challenging class, with a precision of 0.84 and recall of 0.70, indicating some confusion with falling and walking as visualized in Figure 12.

Class	Precision	Recall	F1-score
Falling	0.93	0.96	0.95
Lying down	0.84	0.70	0.76
Still	0.97	0.97	0.97
Transitioning	1.00	1.00	1.00
Walking	0.95	0.97	0.96
Accuracy			0.95

Table 5.

Classification report for CNN-LSTM on PCA-transformed data.

Figure 12.
Normalized confusion matrix of the CNN-LSTM model on PCA-transformed CSI data, showing high classification performance for most activities, with lying down remaining the most challenging class due to some confusion with falling and walking.

3.9.

GNN Training Procedure

Similar to the previously discussed methods, the GNN training pipeline begins by loading the CSI dataset from separate directories, each corresponding to a specific activity class. Labels for each activity class are numerically encoded using a label encoder.

Next, we transform the CSI data into graph-structured data suitable for input into GNN. Each CSI sample is converted into a graph where each node corresponds to each subcarrier, and the features associated with each node consist of a time-series signal of length 100 (representing the sequence length). A predefined chain graph structure with self-loops is created to connect these nodes, effectively modeling relationships between adjacent subcarriers.

The dataset is then encapsulated into a custom PyTorch Geometric InMemoryDataset class (CSIGraphDataset), creating graph objects containing node features, edge information, and corresponding labels.

The model itself is defined using PyTorch Geometric’s graph convolutional layers. The architecture consists of two graph convolutional network convolution layers, each followed by a Rectified Linear Unit (ReLU), to extract spatial relationships across subcarriers within each CSI graph. A global mean pooling operation is applied to aggregate node-level features into a graph-level representation, which is then passed through a linear fully connected layer to predict activity classes. The model uses a log-softmax activation function for output, suitable for multiclass classification. The model architecture and training parameters of GNN are presented in Table 6.

Parameter	Value
GCNConv - 1st Layer	in_channels = 100, out_channels = 64
Activation - 1st Layer	ReLU
GCNConv - 2nd Layer	in_channels = 64, out_channels = 64
Activation - 2nd Layer	ReLU
Pooling	global_mean_pool
Linear layer	Linear (64 5)
Output activation	log_softmax
Loss function	NLLLoss
Optimizer	Adam
Learning rate	0.0005
Early stopping	monitor = val_loss, patience = 10
Epochs	150
Batch size	16
Train split	0.8
Test split	0.2

Table 6.

GNN model architecture and training parameters.

3.10.

Performance Evaluation of GNN

The performance evaluation of the GNN model shows promising yet slightly lower results compared to the CNN-LSTM models. The loss and accuracy curves, as displayed in Figure 13, indicate good convergence, with validation accuracy reaching 89.41%. The classification report, as presented in Table 7, highlights that the model performs well on activities such as falling, still, transitioning, and walking. However, the model struggles with the lying down class, yielding the lowest F1-score of 0.65, due to both lower precision (0.67) and recall (0.64). This is further reflected in the confusion matrix, where 21% of lying down samples were misclassified as falling, and 14% as walking, as visualized in Figure 14.

Figure 13.
Training and validation loss and accuracy curves of the GNN model, showing good convergence with a validation accuracy of 89.41% despite slightly lower overall performance compared to CNN-LSTM models.

Class	Precision	Recall	F1-score
Falling	0.87	0.92	0.89
Lying down	0.67	0.64	0.65
Still	0.96	0.89	0.93
Transitioning	0.97	0.88	0.92
Walking	0.88	0.96	0.92
Accuracy			0.89

Table 7.

Classification report for GNN.

Figure 14.
Normalized confusion matrix of the GNN model, highlighting strong performance for falling, still, transitioning, and walking, while showing misclassifications for the lying down class, with 21% labeled as falling and 14% as walking.

3.11.

Performance Evaluation of GNN on PCA-Transformed Data

The performance evaluation of the GNN model trained with the PCA dataset shows that while the overall accuracy is 85.59%, there are clear weaknesses in classifying some activity classes. As shown in the loss and accuracy curves in Figure 15, we can see that the model generally converged well within 30 epochs, though the validation loss plateaued earlier than the training loss. According to the classification report in Table 8, the lying down class suffers the most, with a precision of 0.46, recall of 0.41, and F1-score of 0.44, indicating that the model has difficulty distinguishing it. Other classes, such as falling, still, transitioning, and walking, perform significantly better, all with F1-scores above 0.87. The normalized confusion matrix, illustrated in Figure 16, also highlights that lying down is frequently confused with falling and walking, which suggests overlapping features or PCA removing critical discriminative information.

Figure 15.
Training and validation loss and accuracy curves of the GNN model on PCA-transformed CSI data, showing general convergence within 30 epochs, with validation loss plateauing earlier than training, and overall accuracy of 85.59%.

Class	Precision	Recall	F1-score
Falling	0.82	0.92	0.87
Lying down	0.46	0.41	0.44
Still	0.94	0.89	0.91
Transitioning	0.94	0.91	0.93
Walking	0.87	0.87	0.87
Accuracy			0.86

Table 8.

Classification report for GNN on PCA-transformed data.

Figure 16.
Normalized confusion matrix of the GNN model on PCA-transformed CSI data, highlighting strong performance for falling, still, transitioning, and walking, while showing frequent misclassifications of the lying down class as falling or walking.

3.12.

Performance Evaluation of Transformer

The performance evaluation of the Transformer model indicates moderate effectiveness in classifying human activity using CSI data. The model yielded a test accuracy of 71.32%, with the training and validation loss showing consistent convergence, as depicted in Figure 17. However, validation accuracy plateaued below the training accuracy, suggesting slight overfitting.

Figure 17.
Training and validation loss and accuracy curves of the Transformer model on CSI data, showing consistent convergence with a test accuracy of 71.32%, while validation accuracy plateaued below training, indicating slight overfitting.

From the classification report displayed in Table 9, the still class performed best with an F1-score of 0.85, while lying down and transitioning suffered poor recall scores of 0.15 and 0.17, respectively. These two classes are often confused with walking, as displayed in the normalized confusion matrix in Figure 18. For instance, 52% of lying down samples was misclassified as walking, and 45% of transitioning was misclassified similarly.

Class	Precision	Recall	F1-score
Falling	0.81	0.80	0.81
Lying down	0.53	0.15	0.23
Still	0.79	0.91	0.85
Transitioning	0.49	0.17	0.25
Walking	0.65	0.92	0.76
Accuracy			0.71

Table 9.

Classification report for transformer.

Figure 18.
Normalized confusion matrix of the Transformer model, highlighting strong performance for the still class, while lying down and transitioning are frequently misclassified as walking, with 52% and 45% misclassification rates, respectively.

This implies that although the Transformer model is capable of identifying more stationary actions like still and falling, it struggles with nuanced or ambiguous transitions such as lying down and transitioning, which share overlapping CSI characteristics with walking.

3.13.

Performance Evaluation of Transformer on PCA-Transformed Data

The evaluation of the Transformer model trained on the PCA-reduced CSI dataset indicates moderate performance, yielding a test accuracy of 83.09%. As shown in the training curves in Figure 19, both training and validation loss decrease consistently, following a smooth downward trajectory. The model’s accuracy also improves progressively across the 75 training epochs. However, a slight overfitting trend emerges near the end, evidenced by the growing disparity between training and validation accuracy.

Figure 19.
Training and validation loss and accuracy curves of the Transformer model on PCA-transformed CSI data, showing consistent improvement across 75 epochs and a test accuracy of 83.09%, with slight overfitting observed near the end of training.

The classification report, as highlighted in Table 10, reveals that falling, still, transitioning, and walking are well recognized with F1-scores above 0.84, whereas lying down is significantly underperforming with an F1-score of only 0.37. The confusion matrix, as visualized in Figure 20, confirms that lying down instances are often misclassified as falling, still, or walking, which compromises the overall reliability for anomaly detection.

Despite the decent average performance, this result suggests that PCA-based dimensionality reduction may not preserve all the discriminative features necessary for distinguishing subtle activities like lying down, indicating a possible trade-off between model efficiency and recognition accuracy.

Class	Precision	Recall	F1-score
Falling	0.85	0.90	0.88
Lying down	0.42	0.33	0.37
Still	0.85	0.88	0.87
Transitioning	0.99	0.88	0.93
Walking	0.83	0.86	0.84
Accuracy			0.83

Table 10.

Classification report for transformer on PCA-transformed data.

Figure 20.
Normalized confusion matrix of the Transformer model on PCA-transformed CSI data, showing strong recognition for falling, still, transitioning, and walking, while lying down is frequently misclassified as falling, still, or walking.

Among the three models evaluated, CNN-LSTM, GNN, and Transformer, each demonstrated unique strengths and sensitivities to PCA-transformed data. The CNN-LSTM model performed consistently well both with and without PCA, indicating its robustness in handling high-dimensional raw CSI data. A slight accuracy improvement after PCA implies that the model benefits from noise reduction and feature selection while retaining essential temporal dynamics. On the other hand, the GNN model yielded better results without PCA. This suggests that PCA likely removed key spatial relationships among CSI subcarriers—relationships that GNNs are specifically designed to leverage. In contrast, the Transformer model experienced significant performance gains with PCA, demonstrating its preference for lower-dimensional inputs to avoid complications associated with processing complex raw data.

The variation in performance across models can be attributed to architectural suitability. CNN-LSTM excels at learning both spatial and temporal features, making it well equipped to process sequential CSI data in its raw form. GNNs, which are structured to model relationships between nodes (e.g., subcarriers), rely heavily on the original structure of the input. Applying PCA alters this structure and likely impairs the GNN’s ability to fully capture spatial dependencies. The Transformer, which depends on self-attention mechanisms, benefits from dimensionality reduction since lower-dimensional inputs simplify attention computations and reduce noise, thereby improving its ability to generalize.

PCA proved to be a double-edged sword. While it enhances training efficiency by reducing feature space and filtering noise, it can also eliminate subtle but crucial features necessary for accurate classification. This trade-off was particularly evident in the GNN and Transformer models, where performance on certain classes—especially the lying down class—dropped after applying PCA. This suggests that important discriminative features may have been lost during dimensionality reduction, impairing class-wise sensitivity. Therefore, while PCA can be beneficial for simplifying data, its impact must be carefully assessed based on the architecture and task.

3.14.

Real-Time System Architecture

To enable seamless HAR in dynamic environments, we developed a real-time processing pipeline that leverages a sliding window mechanism. This technique allows continuous packet capture, data preprocessing, and activity classification, ensuring timely and accurate system responses. The pipeline operates on an end-to-end automated framework, integrating real-time packet acquisition, signal transformation, and predictive modeling, as illustrated in Figure 21.

Figure 21.
Experimental environment for real-time HAR, showing the setup for continuous packet capture, data preprocessing, and activity classification using a sliding window pipeline for dynamic activity monitoring.

Specifically, Wi-Fi packet captures are initiated remotely using tcpdump over Secure Shell (SSH), allowing efficient and scalable deployment across edge devices. Each capture batch consists of 100 packets, stored into timestamped Packet Capture (PCAP) files at predefined intervals. Importantly, the sliding window mechanism creates overlapping capture sessions, which ensures no loss of information and facilitates uninterrupted feature extraction—a critical requirement for real-time classification systems.

Upon collection, each PCAP file is instantly converted to CSV format using a custom parsing script. This intermediate format simplifies downstream processing by structuring the raw CSI in a tabular format. Preprocessing is then performed on each CSV file to enhance data quality and uniformity. Non-relevant subcarriers are excluded to reduce noise and computational overhead. Subsequently, a Hampel filter is applied to eliminate outliers, while the SG filter is employed to smooth temporal fluctuations in the CSI signals. These signal processing techniques are chosen for their robustness in handling real-world, noisy wireless data. Finally, min-max normalization is used to scale the data features between 0 and 1, ensuring consistent input ranges for the deep learning model.

The normalized data are subsequently reshaped to fit the input dimensions required by our pre-trained CNN-LSTM model. The model, which combines spatial feature extraction via convolutional layers and temporal sequence modeling through LSTM units, is used to predict the human activity class in real time. This architecture is well suited for CSI data, where both spatial variations across subcarriers and temporal dynamics across packet sequences are significant for accurate classification.

To manage multiple tasks such as packet capture, file conversion, preprocessing, and inference concurrently, the system employs Python’s threading module in combination with the ThreadPoolExecutor. This multi-threaded design ensures that each pipeline component runs in parallel without introducing latency. Moreover, real-time file system monitoring is implemented using the watchdog library, enabling immediate detection and processing of new PCAP files as they are generated.

3.15.

Real-Time Performance Validation

We validated the real-time performance of the CNN-LSTM model with four activity classes: falling, staying still, transitioning, and walking. Each class had 500 samples, and the model achieved an overall accuracy of 95.48%. Each activity was performed 20 times. The results are shown in Table 11.

Actual Activity	Falling	Staying Still	Transitioning	Walking
Falling	18	0	0	2
Staying still	0	19	1	0
Transitioning	0	0	17	3
Walking	0	0	0	20

Table 11.

Real-time classification results of the CNN-LSTM model with four activity classes.

After confirming good performance with four activities, we introduced a fifth class, lying down. Real-time validation was conducted again with and without PCA under LOS and NLOS conditions.

Testing was conducted across five activities: falling, lying down, staying still, transitioning, and walking under both LOS and NLOS conditions. Each activity was performed 20 times. Results for CNN-LSTM with and without PCA are shown in Tables 12 to 15.

Actual (LOS)	Falling	Lying Down	Staying Still	Transitioning	Walking
Falling	18	2	0	0	0
Lying down	8	12	0	0	0
Staying still	0	0	19	1	0
Transitioning	0	0	0	18	2
Walking	0	0	0	1	19

Table 12.

CNN-LSTM results without PCA in LOS scenario.

Actual (NLOS)	Falling	Lying Down	Staying Still	Transitioning	Walking
Falling	16	3	0	0	1
Lying down	10	10	0	0	0
Staying still	0	0	20	0	0
Transitioning	0	0	0	18	2
Walking	0	0	0	0	20

Table 13.

CNN-LSTM results without PCA in NLOS scenario.

Actual (LOS)	Falling	Lying Down	Staying Still	Transitioning	Walking
Falling	17	2	0	0	1
Lying down	12	7	0	0	1
Staying still	0	0	19	1	0
Transitioning	0	0	0	17	3
Walking	0	0	0	1	19

Table 14.

CNN-LSTM results with PCA in LOS scenario.

Actual (NLOS)	Falling	Lying Down	Staying Still	Transitioning	Walking
Falling	18	0	0	0	2
Lying down	6	12	0	0	2
Staying still	0	1	19	0	0
Transitioning	0	0	0	19	1
Walking	0	0	0	0	20

Table 15.

CNN-LSTM results with PCA in NLOS scenario.

A recurring issue was the confusion between falling and lying down across LOS and NLOS conditions. This is likely due to the postural similarity reflected in similar CSI patterns. While PCA helped reduce this confusion, it did not eliminate it.

PCA improved model performance in NLOS scenarios by filtering out irrelevant variance and highlighting key features, particularly improving classification of staying still, transitioning, and walking.

Signal obstruction under NLOS increased the falling vs. lying down confusion, but the model remained robust for other classes. Overall, PCA enhanced robustness, but additional techniques may be needed to resolve overlapping patterns in similar activities.

The classification results for the CNN-LSTM model in the second setup, where the Wi-Fi router and Raspberry Pi were placed in different rooms with the person present for testing, are shown in Table 16. This setup, depicted in Figure 3, simulates an extreme NLOS scenario, reflecting a common real-world situation, where the Wi-Fi router and monitoring devices (like the Raspberry Pi) are located in separate rooms, and the person is in a different room.

Actual Activity	Falling	Staying Still	Transitioning	Walking
Falling	16	2	0	2
Staying still	0	19	1	0
Transitioning	0	0	17	3
Walking	1	0	0	19

Table 16.

CNN-LSTM results with PCA in different room (extreme NLOS scenario with four activity classes).

The system performs well across all activity categories, even in this challenging extreme NLOS setup. It correctly identifies 16 out of 20 Falling instances, 19 out of 20 Staying Still instances, and 19 out of 20 Walking instances, with only a few misclassifications. The system is particularly effective at detecting minimal movement and static activities, such as Staying Still, and can reliably identify dynamic actions like Walking. For Transitioning, the system achieved 17 correct classifications but had a few more misclassifications, likely due to the subtle nature of the movement in NLOS conditions.

Overall, the results demonstrate that the CNN-LSTM model is robust in recognizing human activities, even in environments where environmental separation and potential signal degradation could be challenging. These results highlight the system’s suitability for practical applications in smart home monitoring systems, showing its ability to classify activities effectively even in non-ideal conditions.

3.16.

Real-Time Performance Validation in a Hospital Environment

To evaluate the performance of the proposed CNN-LSTM model in a real-world setting, we performed real-time experiments in a real hospital room. The model was trained to classify four types of human activities—falling, staying still, transitioning, and walking—based on temporal-spatial features derived from sensor data.

As shown in Table 17, the model achieved promising classification performance in a complex and dynamic hospital environment. The most critical activity class—falling—was correctly classified in 12 out of 20 cases, resulting in a true positive rate of 60%. Misclassifications were primarily observed with staying still (five instances) and walking (three instances), which may be attributed to similar postural signatures or transitional motions immediately following a fall.

Actual Activity	Falling	Staying Still	Transitioning	Walking
Falling	12	5	0	3
Staying still	0	15	1	4
Transitioning	0	3	15	2
Walking	2	1	1	16

Table 17.

Confusion matrix of CNN-LSTM model in real hospital room.

The staying still class exhibited strong recognition accuracy with 15 correct predictions out of 20, although four samples were misclassified as walking. This suggests that the model may sometimes conflate motionless periods at the end of a walking sequence with intentional stillness.

Transitioning activities were recognized correctly in 15 of 20 cases, with occasional confusion with staying still (three instances) and walking (two instances), likely due to the overlapping temporal nature of those activities.

Walking was the most reliably detected class, with 16 correct predictions. Only four samples were misclassified—two as falling, and one each as staying still and transitioning. This demonstrates the robustness of the model in identifying dynamic movement patterns associated with walking.

Overall, the results demonstrate that the CNN-LSTM model can effectively distinguish between critical human activities in real time within a hospital environment, which is inherently noisy and filled with unpredictable human motion. While further tuning is necessary to improve fall detection sensitivity, the model shows strong potential for integration into intelligent patient monitoring systems.

3.17.

Generalization Performance

Assessing how well a model generalizes to unseen individuals is essential for Wi-Fi CSI-based HAR and fall detection. In real-world deployments, models must reliably identify activities across a variety of people, environments, and postural behaviors. However, due to resource constraints, our current dataset includes only two participants.

To approximate subject-independent evaluation, we adopt a leave-one-participant-out-like strategy: the models are trained exclusively on data from one participant and then tested on the combined data from both participants. While this approach does not constitute a full Leave-One-Subject-Out (LOSO) protocol, it provides preliminary insight into the model’s ability to capture subject-invariant features from CSI data and its potential for generalization beyond the training individual.

The results indicate that, although performance slightly decreases compared to within-subject evaluations, the CNN-LSTM model maintains strong discriminative capability across the unseen participants. This suggests that the model is able to extract meaningful patterns from CSI that are not overly dependent on the specific characteristics of a single participant.

We acknowledge that this evaluation is limited by the small participant pool and does not fully reflect real-world variability. Future work will involve a rigorous LOSO protocol with a substantially larger and more diverse cohort of participants, incorporating variations in age, body type, and movement style. Expanding the dataset and testing across diverse scenarios will enable a more thorough assessment of generalization performance, which is crucial for deploying CSI-based fall detection systems in healthcare and assisted living environments.

3.18.

Comparison With State-of-the-Art

To contextualize the contribution of our system, we benchmarked it against recent state-of-the-art methods in CSI-based HAR published between 2021 and 2025. These works include small-scale and offline approaches [12, 14, 15], architecture-focused models such as InceptionTime [13], and more recent efforts employing mobile receivers, NAS-pruned networks, adaptive convolutional systems, and self-supervised or cross-domain learning [17–21].

As Table 18 shows, most existing approaches are constrained by small datasets, controlled environments, and offline evaluation pipelines. Even recent advances that improve efficiency or adaptability, such as NAS-pruned or adaptive CNN models, still lack validation in real-time and real-world scenarios. Self-supervised and cross-domain methods improve representation learning, but they remain restricted to offline analysis without deployment considerations.

Study	Dataset Size	Environment	Real-Time	Limitations
Moshiri et al. [14]	~420	Single lab	No	Small dataset, offline
Lowe et al. [12]	~1,100	One site	Partial	Low-cost, but limited generalization
Forbes et al. [15]	~800	1–2 rooms	No	Environment-specific
Zhuravchak et al. [13]	Medium	Controlled lab	No	InceptionTime model, offline only
Yan et al. [16]	Medium	Lab + office	No	Passive Wi-Fi, not real-time
Hu et al. [17]	Small	Construction site	No	Mobile receiver, not scalable
Youm & Go [18]	Medium	Indoor	Partial	NAS/pruning, no deployment
Bayad et al. [19]	Medium	Indoor	No	Adaptive CNN, unseen env. only
Logacjov [20]	Medium	Indoor	No	Self-supervised, offline survey
Thukral et al. [21]	Medium	Indoor	No	Few-shot transfer, not deployed
Our work (2025)	~3,400	Multi-site indoor (LOS + NLOS)	Yes	End-to-end, real-time with preprocessing, sliding windows, PCA, low-cost hardware

Table 18.

Comparison with state-of-the-art CSI-based HAR systems.

In contrast, our work distinguishes itself in three critical ways. First, it employs a substantially larger and more diverse dataset collected across multiple indoor settings, including both LOS and NLOS scenarios. Second, it integrates a robust preprocessing pipeline, including subcarrier selection, Hampel and SG filtering, normalization, and PCA, to enhance signal quality and reduce overfitting. Third, and most importantly, it delivers an end-to-end real-time HAR system on low-cost hardware, demonstrating practical deployability rather than remaining confined to simulation or offline experiments. This positions our system as a step forward toward bridging the gap between academic prototypes and scalable real-world HAR solutions.

Conclusion and Future Directions

This study proposed a non-intrusive fall detection system using Wi-Fi CSI for HAR, addressing the limitations of traditional solutions such as CCTV and wearable devices. Several deep learning models were evaluated, with the CNN-LSTM model achieving the best performance of 94.85% accuracy with PCA. The system demonstrated strong potential for reliable, privacy-preserving fall detection. Real-time implementation achieved low latency (2.3 seconds per sliding window) and was validated under both LOS and challenging NLOS conditions, including extreme scenarios with spatial separation between Wi-Fi devices. These results confirm that the approach offers a practical, low-cost, and effective solution suitable for elderly care and healthcare environments.

4.1.

Limitations and Future Work

Despite its promise, the study has several limitations that point to directions for future research:

Dataset Size and Diversity: The dataset was limited in participant number and activity samples, with 700 falling events and 2,700 normal activity events (lying down, staying still, transitioning, and walking). This constrains generalization, particularly for distinguishing similar postures such as falling vs. lying down. Expanding the dataset to include more subjects, varied environments, and activity patterns will improve robustness.
Generalization to Unseen Individuals: Subject-independent evaluations were preliminary, with models trained on one participant and tested on others as a proof-of-concept. A comprehensive LOSO evaluation with a larger cohort is necessary to ensure real-world applicability.
Model Interpretability: CNN-LSTM models act as black boxes, which can limit trust in healthcare applications. Future work should explore explainability techniques, such as saliency maps or attention visualization, to understand which CSI features influence classification decisions.
Data Augmentation: Rare but critical classes, especially falling and lying down, should be enriched with diverse synthetic or real samples to reduce confusion under LOS and NLOS conditions.
Multi-Sensor Fusion: Integrating Wi-Fi CSI with other modalities—such as accelerometers, gyroscopes, or vision-based sensors—may enhance accuracy and robustness, particularly in complex environments like hospitals.
Lightweight Deployment: Optimizing models for edge devices such as Raspberry Pi using Tiny Machine Learning or efficient architectures (e.g., MobileNet, EfficientNet) would improve portability, energy efficiency, and practical usability.
Multi-Person Recognition: Extending the system to simultaneously recognize multiple individuals remains a challenging yet important goal for smart home and hospital applications.

In summary, CSI-based fall detection with CNN-LSTM demonstrates high accuracy (up to 95.5% for four-class scenarios with PCA), low latency, and resilience in both controlled and real-world hospital environments. The sliding-window pipeline, multithreaded Python processing, and real-time tcpdump capture confirm practical feasibility. Addressing limitations, particularly data set diversity, fall sensitivity, interpretability, and lightweight deployment, will be key to realizing its full potential as a scalable, privacy-preserving monitoring solution for healthcare and eldercare.

Acknowledgments

The authors would like to sincerely thank Ph.D. Parit Plainkum, M.D., specializing in Emergency Medicine, and his team at Samitivej Srinakarin Hospital, our official partner, for their invaluable support and insightful suggestions in enabling the real-time performance validation of our system in a hospital environment. Their expertise and collaboration were essential to the successful completion of this study.

The authors acknowledge the use of ChatGPT for language polishing of the manuscript.

Author Contributions

First Author (Dit Preechakarnjanadit): Conceptualization, Data curation, Formal analysis, Investigation, Methodology, Software, Writing original draft; Second Author (Attaphongse Taparugssanagorn): Conceptualization, Formal analysis, Funding acquisition, Methodology, Project administration, Resources, Supervision, Visualization, Validation Writing – review & editing.

Funding

This research did not receive external funding from any agencies.

Ethical statement

Not applicable.

Data availability statement

Source data not available for this article.

Conflict of Interest

The authors declare no conflict of interest.

References

1.
World Health Organization. Aging and life course unit, WHO global report on falls prevention in older age, Nonserial Publication Series, World Health Organization, 2008, [Accessed 2025 Nov 4]. Available from: https://books.google.co.th/books?id=ms9o2dvfaQkC.
2.
Chen J, Kwong K, Chang D, Luk J, Bajcsy R, Wearable sensors for reliable fall detection, in: 2005 IEEE Engineering in Medicine and Biology 27th Annual Conference, IEEE, 2006, pp. 3551–3554. doi:10.1109/IEMBS.2005.1617246.
3.
Ramachandran A, Karuppiah A. A survey on recent advances in wearable fall detection systems. Biomed Res Int. 2020;1(2020):1–17. doi:10.1155/2020/2167160.
4.
Feng P, Yu M, Naqvi SM, Chambers JA, Deep learning for posture analysis in fall detection, in: 2014 19th International Conference on Digital Signal Processing, IEEE, 2014, pp. 12–17. doi:10.1109/ICDSP.2014.6900806.
5.
Boudouane I, Makhlouf A, Harkat MA, Hammouche MZ, Saadia N, Cherif AR. Fall detection system with portable camera. J Ambient Intell Human Comput. 2020;11(7):2647–2659. doi:10.1007/s12652-019-01326-x.
6.
Chowdhury TZ, Leung C, Miao CY, WiHACS: leveraging WiFi for human activity classification using OFDM subcarriers’ correlation, in: 2017 IEEE Global Conference on Signal and Information Processing (GlobalSIP), IEEE; 2017, pp. 338–342. doi:10.1109/GlobalSIP.2017.8308660.
7.
Wang F, Gong W, Liu J, Wu K. Channel selective activity recognition with WiFi: a deep learning approach exploring wideband information. IEEE Trans Netw Sci Eng. 2018;7(1):181–192. doi:10.1109/TNSE.2018.2825144.
8.
Tan B, Chen Q, Chetty K, Woodbridge K, Li W, Piechocki R. Exploiting WiFi channel state information for residential healthcare informatics. IEEE Commun Mag. 2018;56(5):130–137. doi:10.1109/MCOM.2018.1700064.
9.
Albert MV, Kording K, Herrmann M, Jayaraman A. Fall classification by machine learning using mobile phones. PLOS ONE. 2012;7(5):e36556. doi:10.1371/journal.pone.0036556.
10.
Pang I, Okubo Y, Sturnieks D, Lord SR, Brodie MA. Detection of near falls using wearable devices: a systematic review. J Geriatr Phys Ther. 2019;42(1):48–56. doi:10.1519/JPT.0000000000000181.
11.
De Miguel K, Brunete A, Hernando M, Gambao E. Home camera-based fall detection system for the elderly. Sens. 2017;17(12):2864. doi:10.3390/s17122864.
12.
Lowe H, Lamahewage M, Gunasekera K, Toward a low-cost WiFi-based real-time human activity recognition system, in: 2022 IEEE International Conference on Omni-layer Intelligent Systems (COINS), IEEE, Barcelona, Spain, 2022, pp. 1–6. doi:10.1109/COINS54846.2022.9854935.
13.
Zhuravchak A, Kapshii O, Pournaras E. Human activity recognition based on Wi-Fi CSI data: a deep neural network approach. Procedia Comput Sci. 2022;198:59–66. doi:10.1016/j.procs.2021.12.211.
14.
Moshiri PF, Shahbazian R, Nabati M, Ghorashi SA. A CSI-based human activity recognition using deep learning. Sens. 2021;21(21):7225. doi:10.3390/s21217225.
15.
Forbes G, Massie S, Craw S, WiFi-based human activity recognition using raspberry Pi, in: 2020 IEEE 32nd International Conference on Tools with Artificial Intelligence (ICTAI), IEEE, 2020, pp. 722–730. doi:10.1109/ICTAI50040.2020.00115.
16.
Yan H, Zhang Y, Wang Y, Xu K. WiAct: a passive WiFi-based human activity recognition system. IEEE Sens J. 2020;20(1):296–305. doi:10.1109/JSEN.2019.2938245.
17.
Hu Y, Li H, Cheng M, Zhang M, Han S, Umer W. A mobile receiver WiFi-CSI approach for fall detection of construction workers. Dev Built Environ. 2025;23:1–13. doi:10.1016/j.dibe.2025.100745.
18.
Youm S, Go S. Lightweight and efficient CSI-based human activity recognition via bayesian optimization-guided architecture search and structured pruning. Appl Sci. 2025;15(2):1–16. doi:10.3390/app15020890.
19.
Bayad I, Mahfouz S, Samrouth K, Mourad-Chehade F, Amoud H, Adaptive fall detection using WiFi CSI for unseen environments and new individuals, in: 2025 IEEE Medical Measurements & Applications (MeMeA), IEEE, Chania, Greece, 2025, pp. 1–6. doi:10.1109/MeMeA65319.2025.11068083.
20.
Logacjov A. Self-supervised learning for accelerometer-based human activity recognition: a survey. Proc ACM Interact Mob Wearable Ubiquitous Technol. 2024;8(4):1–42. doi:10.1145/3699767.
21.
Thukral M, Haresamudram H, Plötz T, Cross-Domain HAR. Few-shot transfer learning for human activity recognition. ACM Trans Intell Syst Technol. 2025;16(1):1–35. doi:10.1145/3704921.
22.
Xiao J, Wu K, Yi Y, Ni LM, FIFS: fine-grained indoor fingerprinting system, in: 2012 21st international conference on computer communications and networks (ICCCN), IEEE, 2012, pp. 1–7. doi:10.1109/ICCCN.2012.6289200.
23.
Wang X, Gao L, Mao S. CSI phase fingerprinting for indoor localization with A deep learning approach. journal=IEEE Internet of Things Journal. 2016;3(6):1113–1123. doi:10.1109/JIOT.2016.2558659.
24.
Rohde & Schwarz, 802.11ac technology introduction, [Internet], Rohde & Schwarz; [Accessed 2025 Nov 4]. Available from: https://scdn.rohde-schwarz.com/ur/pws/dl_downloads/dl_application/application_notes/1ma192/1MA192_7e_80211ac_technology.pdf.
25.
Jolliffe IT. Principal component analysis. 2nd ed. Springer, New York. Springer Series in Statistics (SSS): Springer; 2002. doi:10.1007/b98835.

Written by

Dit Preechakarnjanadit, Attaphongse Taparugssanagorn

Article Type: Research Paper

•

Date of acceptance: November 2025

Date of publication: December 2025

•

DoI: 10.5772/acrt20250099

Copyright: The Author(s), Licensee IntechOpen, License: CC BY 4.0

Download for free

© The Author(s) 2025. Licensee IntechOpen. This is an Open Access article distributed under the terms of the Creative Commons Attribution License (https://creativecommons.org/licenses/by/4.0/), which permits unrestricted reuse, distribution, and reproduction in any medium, provided the original work is properly cited.

Impact of this article

531

Downloads

352

Views

Share this article

Real-Time, Non-Intrusive Fall Detection via Wi-Fi CSI: A Comparative Study of CNN-LSTM, GNN, and Transformer Models

Abstract

Keywords

Introduction

Research Gaps

Our Contribution

Methodology

Figure 1.

Mathematical Formulation of CSI

System Architecture and CSI Acquisition Setup

Data Preprocessing

Removal of Null and Pilot Subcarriers

Table 1.

Elimination of Low-Energy Subcarriers

Outlier Detection and Removal

Noise Reduction and Signal Smoothing

Data Normalization

Dimensionality Reduction via Principal Component Analysis

Classification

Model Training Process

Performance Evaluation

Experiments, Results, and Discussion

Equipment Setup

Experimental Setup

Figure 2.

Figure 3.

Data Collection

Table 2.

Sliding Window Technique

Preventing Overfitting to Activity Duration

Data Preprocessing

Figure 4.

Figure 5.

Figure 6.

Figure 7.

Figure 8.

Model Training and Performance Evaluation

CNN-LSTM Training Procedure

Table 3.

Performance Evaluation of CNN-LSTM

Figure 9.

Table 4.

Figure 10.

Performance Evaluation of CNN-LSTM on PCA-Transformed Data

Figure 11.

Table 5.

Figure 12.

GNN Training Procedure

Table 6.

Performance Evaluation of GNN

Figure 13.

Table 7.

Figure 14.

Performance Evaluation of GNN on PCA-Transformed Data

Figure 15.

Table 8.

Figure 16.

Performance Evaluation of Transformer

Figure 17.

Table 9.

Figure 18.

Performance Evaluation of Transformer on PCA-Transformed Data

Figure 19.

Table 10.

Figure 20.

Real-Time System Architecture

Figure 21.

Real-Time Performance Validation

Table 11.

Table 12.

Table 13.

Table 14.

Table 15.

Table 16.

Real-Time Performance Validation in a Hospital Environment

Table 17.

Generalization Performance

Comparison With State-of-the-Art

Table 18.

Conclusion and Future Directions