Saturday, December 14. 2024
Lua
CMake
- CMake: the Good, the Bad, the Weird - ... strongly recommend for any developers who are learning or struggling with CMake to purchase Professional CMake -- I have found it very helpful in explaining things where most other resources haven't, and it is consistently updated with new major versions of CMake.
Saturday, November 2. 2024
arXiv
One of the main limitations for the development and deployment of many Green Radio Frequency Identification (RFID) and Internet of Things (IoT) systems is the access to energy sources. In this aspect batteries are the main option to be used in energy constrained scenarios, but their use is limited to certain cases, either because of the constraints imposed by a reduced-form factor, their limited lifespan, or the characteristics of the environment itself (e.g. operating temperature, risk of burning, need for fast response, sudden voltage variations). In this regard, supercapacitors present an interesting alternative for the previously mentioned type of environment, although, due to their short-term capacity, they must be combined with an alternative energy supply mechanism. Energy harvesting mechanisms, in conjunction with ultra-low-power electronics, supercapacitors and various methods to improve the efficiency of communications, have enabled the emergence of battery-less passive electronic devices such as sensors, actuators or transmitters. This paper presents a novel analysis of the performance of an energy harvesting system based on vibrations for Green RFID and IoT applications in the field of maritime transport. The results show that the proposed system allows for charging half of a 1.2 F supercapacitor in about 72 minutes, providing a stable current of around 210 uA and a power output of 0.38 mW.
Friday, August 30. 2024
Machine Learning
Artificial Neural Network and Deep Learning: Fundamentals and Theory
"Artificial Neural Network and Deep Learning: Fundamentals and Theory" offers a comprehensive exploration of the foundational principles and advanced methodologies in neural networks and deep learning. This book begins with essential concepts in descriptive statistics and probability theory, laying a solid groundwork for understanding data and probability distributions. As the reader progresses, they are introduced to matrix calculus and gradient optimization, crucial for training and fine-tuning neural networks. The book delves into multilayer feed-forward neural networks, explaining their architecture, training processes, and the backpropagation algorithm. Key challenges in neural network optimization, such as activation function saturation, vanishing and exploding gradients, and weight initialization, are thoroughly discussed. The text covers various learning rate schedules and adaptive algorithms, providing strategies to optimize the training process. Techniques for generalization and hyperparameter tuning, including Bayesian optimization and Gaussian processes, are also presented to enhance model performance and prevent overfitting. Advanced activation functions are explored in detail, categorized into sigmoid-based, ReLU-based, ELU-based, miscellaneous, non-standard, and combined types. Each activation function is examined for its properties and applications, offering readers a deep understanding of their impact on neural network behavior. The final chapter introduces complex-valued neural networks, discussing complex numbers, functions, and visualizations, as well as complex calculus and backpropagation algorithms. This book equips readers with the knowledge and skills necessary to design, and optimize advanced neural network models, contributing to the ongoing advancements in artificial intelligence.
Sunday, August 18. 2024
Debian Linux Grub initrd recovery
During the early days of Debian Linux, one could get away with a 100mb boot partition. With the explosion of included firmware files in the initramfs file, the boot directory needs to be 500mb or even 1gb in size. I have a couple older machines I have not yet rebuilt, which have limited size. I resort to copying various initrd.img files in and out as I upgrade kernels or boot into older versions of the kernel.
Sometimes I have forgotten to properly copy an image back into /boot and run update-grub. When I do that, I have to boot into grub and run the grub command line. Some commands that I run are as follows:
Set a variable so when requesting help, you can page through entries:
grub> set pager=1
List the various mount points:
grub> ls
List the files in a particular mount point
grub> ls (hd0,msdos2)/root
Startup commands, depending upon where linux and initrd files are found:
grub> set root=(hd0,msdos2) grub> linux (hd0,msdos1)/vmlinux-6.1.0-15-amd64 root=LABEL=main grub> initrd (hd0,msdos2)/root/initramfs/initrd.img-6.1.0-15-amd64 grub> boot
Tuesday, August 6. 2024
Classical Machine Learning: Seventy Years of Algorithmic Learning Evolution
Classical Machine Learning: Seventy Years of Algorithmic Learning Evolution
Machine learning (ML) has transformed numerous fields, but understanding its foundational research is crucial for its continued progress. This paper presents an overview of the significant classical ML algorithms and examines the state-of-the-art publications spanning twelve decades through an extensive bibliometric analysis study. We analyzed a dataset of highly cited papers from prominent ML conferences and journals, employing citation and keyword analyses to uncover critical insights. The study further identifies the most influential papers and authors, reveals the evolving collaborative networks within the ML community, and pinpoints prevailing research themes and emerging focus areas. Additionally, we examine the geographic distribution of highly cited publications, highlighting the leading countries in ML research. This study provides a comprehensive overview of the evolution of traditional learning algorithms and their impacts. It discusses challenges and opportunities for future development, focusing on the Global South. The findings from this paper offer valuable insights for both ML experts and the broader research community, enhancing understanding of the field's trajectory and its significant influence on recent advances in learning algorithms.
Nowadays, the new network architectures with respect to the traditional network topologies must manage more data, which entails having increasingly robust and scalable network structures. As there is a process of growth, adaptability, and change in traditional data networks, faced with the management of large volumes of information, it is necessary to incorporate the virtualization of network functions in the context of information content networks, in such a way that there is a balance between the user and the provider at cost level and profit, the functions on the network. In turn, NFVs (Network Functions Virtualization) are considered network structures designed based on IT virtualization technologies, which allow virtualizing of the functions that can be found in the network nodes, which are connected through routing tables, which allows offering communication services for various types of customers. Therefore, information-centric networks (IC, unlike traditional data networks which proceed to exchange information between hosts using data packets and TCP/IP communication protocols, use the content of the data for this purpose where the data travels through the network is stored in a routing table located in the CR (Content Router) of the router temporality to be reused later, which allows for reducing operation costs and capital costs . The purpose of this work is to analyze how the virtualization of network functions is integrated into the field of information-centric networks. Also, the advantages and disadvantages of both architectures are considered and presented as a critical analysis when considering the current difficulties and future trends of both network topologies.
OpenLogParser: Unsupervised Parsing with Open-Source Large Language Models
Log parsing is a critical step that transforms unstructured log data into structured formats, facilitating subsequent log-based analysis. Traditional syntax-based log parsers are efficient and effective, but they often experience decreased accuracy when processing logs that deviate from the predefined rules. Recently, large language models (LLM) based log parsers have shown superior parsing accuracy. However, existing LLM-based parsers face three main challenges: 1)time-consuming and labor-intensive manual labeling for fine-tuning or in-context learning, 2)increased parsing costs due to the vast volume of log data and limited context size of LLMs, and 3)privacy risks from using commercial models like ChatGPT with sensitive log information. To overcome these limitations, this paper introduces OpenLogParser, an unsupervised log parsing approach that leverages open-source LLMs (i.e., Llama3-8B) to enhance privacy and reduce operational costs while achieving state-of-the-art parsing accuracy. OpenLogParser first groups logs with similar static text but varying dynamic variables using a fixed-depth grouping tree. It then parses logs within these groups using three components: i)similarity scoring-based retrieval augmented generation: selects diverse logs within each group based on Jaccard similarity, helping the LLM distinguish between static text and dynamic variables; ii)self-reflection: iteratively query LLMs to refine log templates to improve parsing accuracy; and iii) log template memory: stores parsed templates to reduce LLM queries for improved parsing efficiency. Our evaluation on LogHub-2.0 shows that OpenLogParser achieves 25% higher parsing accuracy and processes logs 2.7 times faster compared to state-of-the-art LLM-based parsers. In short, OpenLogParser addresses privacy and cost concerns of using commercial LLMs while achieving state-of-the-arts parsing efficiency and accuracy.
Amman City, Jordan: Toward a Sustainable City from the Ground Up
The idea of smart cities (SCs) has gained substantial attention in recent years. The SC paradigm aims to improve citizens' quality of life and protect the city's environment. As we enter the age of next-generation SCs, it is important to explore all relevant aspects of the SC paradigm. In recent years, the advancement of Information and Communication Technologies (ICT) has produced a trend of supporting daily objects with smartness, targeting to make human life easier and more comfortable. The paradigm of SCs appears as a response to the purpose of building the city of the future with advanced features. SCs still face many challenges in their implementation, but increasingly more studies regarding SCs are implemented. Nowadays, different cities are employing SC features to enhance services or the residents quality of life. This work provides readers with useful and important information about Amman Smart City.
Sunday, July 21. 2024
Copy ISO Image to USB
# dd if=Downloads/iso/debian-testing-amd64-netinst.iso of=/dev/sda bs=1M status=progress conv=fdatasync
'fdatasync' is equivalent to running 'sync' as a second command
watch kern.log to determine on which drive the USB is mounted when inserted
ensure that the USB is not auto-mounted by any other application or service
Sunday, June 30. 2024
Someone Notes
For example, (to name just a few items) a stock option pricing model is useless without:
- holiday calendars
- ex dividend dates
- interest rate curves
- real-time stock prices
- corporate actions database
Thursday, May 30. 2024
Agriculture, Citizens' Assembly, Truth Seeking
Leveraging Time-Series Foundation Models in Smart Agriculture for Soil Moisture Forecasting
The recent surge in foundation models for natural language processing and computer vision has fueled innovation across various domains. Inspired by this progress, we explore the potential of foundation models for time-series forecasting in smart agriculture, a field often plagued by limited data availability. Specifically, this work presents a novel application of $\texttt{TimeGPT}$, a state-of-the-art (SOTA) time-series foundation model, to predict soil water potential ($\psi_\mathrm{soil}$), a key indicator of field water status that is typically used for irrigation advice. Traditionally, this task relies on a wide array of input variables. We explore $\psi_\mathrm{soil}$'s ability to forecast $\psi_\mathrm{soil}$ in: ($i$) a zero-shot setting, ($ii$) a fine-tuned setting relying solely on historic $\psi_\mathrm{soil}$ measurements, and ($iii$) a fine-tuned setting where we also add exogenous variables to the model. We compare $\texttt{TimeGPT}$'s performance to established SOTA baseline models for forecasting $\psi_\mathrm{soil}$. Our results demonstrate that $\texttt{TimeGPT}$ achieves competitive forecasting accuracy using only historical $\psi_\mathrm{soil}$ data, highlighting its remarkable potential for agricultural applications. This research paves the way for foundation time-series models for sustainable development in agriculture by enabling forecasting tasks that were traditionally reliant on extensive data collection and domain expertise.
A citizens' assembly is a group of people who are randomly selected to represent a larger population in a deliberation. While this approach has successfully strengthened democracy, it has certain limitations that suggest the need for assemblies to form and associate more organically. In response, we propose federated assemblies, where assemblies are interconnected, and each parent assembly is selected from members of its child assemblies. The main technical challenge is to develop random selection algorithms that meet new representation constraints inherent in this hierarchical structure. We design and analyze several algorithms that provide different representation guarantees under various assumptions on the structure of the underlying graph.
ChatGPT as the Marketplace of Ideas: Should Truth-Seeking Be the Goal of AI Content Governance?
As one of the most enduring metaphors within legal discourse, the marketplace of ideas has wielded considerable influence over the jurisprudential landscape for decades. A century after the inception of this theory, ChatGPT emerged as a revolutionary technological advancement in the twenty-first century. This research finds that ChatGPT effectively manifests the marketplace metaphor. It not only instantiates the promises envisaged by generations of legal scholars but also lays bare the perils discerned through sustained academic critique. Specifically, the workings of ChatGPT and the marketplace of ideas theory exhibit at least four common features: arena, means, objectives, and flaws. These shared attributes are sufficient to render ChatGPT historically the most qualified engine for actualizing the marketplace of ideas theory.
The comparison of the marketplace theory and ChatGPT merely marks a starting point. A more meaningful undertaking entails reevaluating and reframing both internal and external AI policies by referring to the accumulated experience, insights, and suggestions researchers have raised to fix the marketplace theory. Here, a pivotal issue is: should truth-seeking be set as the goal of AI content governance? Given the unattainability of the absolute truth-seeking goal, I argue against adopting zero-risk policies. Instead, a more judicious approach would be to embrace a knowledge-based alternative wherein large language models (LLMs) are trained to generate competing and divergent viewpoints based on sufficient justifications. This research also argues that so-called AI content risks are not created by AI companies but are inherent in the entire information ecosystem. Thus, the burden of managing these risks should be distributed among different social actors, rather than being solely shouldered by chatbot companies.
Why Algorithms Remain Unjust: Power Structures Surrounding Algorithmic Activity
Algorithms play an increasingly-significant role in our social lives. Unfortunately, they often perpetuate social injustices while doing so. The popular means of addressing these algorithmic injustices has been through algorithmic reformism: fine-tuning the algorithm itself to be more fair, accountable, and transparent. While commendable, the emerging discipline of critical algorithm studies shows that reformist approaches have failed to curtail algorithmic injustice because they ignore the power structure surrounding algorithms. Heeding calls from critical algorithm studies to analyze this power structure, I employ a framework developed by Erik Olin Wright to examine the configuration of power surrounding Algorithmic Activity: the ways in which algorithms are researched, developed, trained, and deployed within society. I argue that the reason Algorithmic Activity is unequal, undemocratic, and unsustainable is that the power structure shaping it is one of economic empowerment rather than social empowerment. For Algorithmic Activity to be socially just, we need to transform this power configuration to empower the people at the other end of an algorithm. To this end, I explore Wright's symbiotic, interstitial, and raptural transformations in the context of Algorithmic Activity, as well as how they may be applied in a hypothetical research project that uses algorithms to address a social issue. I conclude with my vision for socially just Algorithmic Activity, asking that future work strives to integrate the proposed transformations and develop new mechanisms for social empowerment.
Sunday, May 26. 2024
Useful Debian Packaging Query
$ ucfq /etc/ssh/sshd_config Configuration file Package Exists Changed /etc/ssh/sshd_config openssh-server Yes No
Tuesday, April 2. 2024
Papers
The State of Lithium-Ion Battery Health Prognostics in the CPS Era
Lithium-ion batteries (Li-ion) have revolutionized energy storage technology, becoming integral to our daily lives by powering a diverse range of devices and applications. Their high energy density, fast power response, recyclability, and mobility advantages have made them the preferred choice for numerous sectors. This paper explores the seamless integration of Prognostics and Health Management within batteries, presenting a multidisciplinary approach that enhances the reliability, safety, and performance of these powerhouses. Remaining useful life (RUL), a critical concept in prognostics, is examined in depth, emphasizing its role in predicting component failure before it occurs. The paper reviews various RUL prediction methods, from traditional models to cutting-edge data-driven techniques. Furthermore, it highlights the paradigm shift toward deep learning architectures within the field of Li-ion battery health prognostics, elucidating the pivotal role of deep learning in addressing battery system complexities. Practical applications of PHM across industries are also explored, offering readers insights into real-world implementations.This paper serves as a comprehensive guide, catering to both researchers and practitioners in the field of Li-ion battery PHM.
The New Agronomists: Language Models are Experts in Crop Management, github
Crop management plays a crucial role in determining crop yield, economic profitability, and environmental sustainability. Despite the availability of management guidelines, optimizing these practices remains a complex and multifaceted challenge. In response, previous studies have explored using reinforcement learning with crop simulators, typically employing simple neural-network-based reinforcement learning (RL) agents. Building on this foundation, this paper introduces a more advanced intelligent crop management system. This system uniquely combines RL, a language model (LM), and crop simulations facilitated by the Decision Support System for Agrotechnology Transfer (DSSAT). We utilize deep RL, specifically a deep Q-network, to train management policies that process numerous state variables from the simulator as observations. A novel aspect of our approach is the conversion of these state variables into more informative language, facilitating the language model's capacity to understand states and explore optimal management practices. The empirical results reveal that the LM exhibits superior learning capabilities. Through simulation experiments with maize crops in Florida (US) and Zaragoza (Spain), the LM not only achieves state-of-the-art performance under various evaluation metrics but also demonstrates a remarkable improvement of over 49\% in economic profit, coupled with reduced environmental impact when compared to baseline methods.
DHNet: A Distributed Network Architecture for Smart Home
With the increasing popularity of smart homes, more and more devices need to connect to home networks. Traditional home networks mainly rely on centralized networking, where an excessive number of devices in the centralized topology can increase the pressure on the central router, potentially leading to decreased network performance metrics such as communication latency. To address the latency performance issues brought about by centralized networks, this paper proposes a new network system called DHNet, and designs an algorithm for clustering networking and communication based on vector routing. Communication within clusters in a simulated virtual environment achieves a latency of approximately 0.7 milliseconds. Furthermore, by directly using the first non-"lo" network card address of a device as the protocol's network layer address, the protocol avoids the several tens of milliseconds of access latency caused by DHCP. The integration of service discovery functionality into the network layer protocol is achieved through a combination of "server-initiated service push" and "client request + server reply" methods. Compared to traditional application-layer DNS passive service discovery, the average latency is reduced by over 50%. The PVH protocol is implemented in the user space using the Go programming language, with implementation details drawn from Google's gVisor project. The code has been ported from x86\_64 Linux computers to devices such as OpenWrt routers and Android smartphones. The PVH protocol can communicate through "tunnels" to provide IP compatibility, allowing existing applications based on TCP/IP to communicate using the PVH protocol without requiring modifications to their code.
Saturday, March 23. 2024
Linux Good Old Stuff
- Linux Virtual Server - a highly scalable and highly available server built on a cluster of real servers, with the load balancer running on the Linux operating system (last date on the web page: 2012)
- Linux-VServer - provides virtualization for GNU/Linux systems. This is accomplished by kernel level isolation. It allows to run multiple virtual units at once. Those units are sufficiently isolated to guarantee the required security, but utilize available resources efficiently, as they run on the same kernel. (a precursor to LXC) (last mod 2018)
Sunday, February 4. 2024
C++ Header File Statistics
From #include <rules>, via hacker news, two compile time options to consider:
- use the preprocessor output (cl /E, gcc -E)
- use the include output (cl /showIncludes, gcc -M), gather the codebase statistics (average size after preprocessing, most included header files, header files with largest payload, etc.)
I've been doing this backwards, but I don't understand why though:
The header file named after the source should be included first (to catch errors in the header)
Wednesday, January 24. 2024
Sleep
SPAND: Sleep Prediction Architecture using Network Dynamics
Sleep behavior significantly impacts health and acts as an indicator of physical and mental well-being. Monitoring and predicting sleep behavior with ubiquitous sensors may therefore assist in both sleep management and tracking of related health conditions. While sleep behavior depends on, and is reflected in the physiology of a person, it is also impacted by external factors such as digital media usage, social network contagion, and the surrounding weather. In this work, we propose SPAND (Sleep Prediction Architecture using Network Dynamics), a system that exploits social contagion in sleep behavior through graph networks and integrates it with physiological and phone data extracted from ubiquitous mobile and wearable devices for predicting next-day sleep labels about sleep duration. Our architecture overcomes the limitations of large-scale graphs containing connections irrelevant to sleep behavior by devising an attention mechanism. The extensive experimental evaluation highlights the improvement provided by incorporating social networks in the model. Additionally, we conduct robustness analysis to demonstrate the system's performance in real-life conditions. The outcomes affirm the stability of SPAND against perturbations in input data. Further analyses emphasize the significance of network topology in prediction performance revealing that users with higher eigenvalue centrality are more vulnerable to data perturbations.
Sunday, January 21. 2024
Boehm Garbage Collection, Cords String Handling
HackerNews had an article about Boehm-Demers-Weiser conservative C/C++ Garbage Collector which leads to A garbage collector for C and C++.
It can be used in garbage collection mode or leak detection mode.
The garbage collector distribution includes a C string (cord) package that provides for fast concatenation and substring operations on long strings. A simple curses- and win32-based editor that represents the entire file as a cord is included as a sample application. From Wikipedia:
Boehm GC is also distributed with a C string handling library called cords. This is similar to ropes in C++ (trees of constant small arrays), but instead of using reference counting for proper deallocation, it relies on garbage collection to free objects. Cords are good at handling very large texts, modifications to them in the middle, slicing, concatenating, and keeping history of changes (undo/redo functionality).
Code can be found at github - The Boehm-Demers-Weiser conservative C/C++ Garbage Collector (bdwgc, also known as bdw-gc, boehm-gc, libgc)