Although it is challenging to imagine how ones and zeros speak to amino acid letter sequences, research indicates that—having been deeply immersed in communicating with the human genome—scientists already have precious insights.
History has taught us that meddling into natural bodily processes with electronic devices isn’t without risks, as demonstrated by some past implantable devices that should have never left the lab). Nevertheless, medical electronics engineering, and especially biomedical engineering, is brimming with potential: it even goes as far as improving bacterial immune defences. And given our experiences of the COVID-19 pandemic, it has never been more clear than now that we need to learn more about diseases—and that the application of electronics engineering to life sciences is essential.
Important Considerations in Bioinformatics
Bioinformatics is an interdisciplinary research field that deals with the capture and analysis of biological data. It includes disciplines such as computer science, mathematics, physics, and biology, as well as of course medicine—the latter field being a major driver of the advancements made in human genome research.
A DNA fact sheet by the National Human Genome Research Institute (NHGRI)
Image credit: NHGRI
The essential portion of data about the human blueprint was gathered with the Human Genome Project. The work on the Human Genome Project identified that the twisted helix shape of the metre-long DNA (deoxyribonucleic acid) molecule is made out of three billion nucleotide pairs. If written as a list, these three billion base pairs that create the human genes would produce around 10,000 epic-sized novels. On top of this, genes are only a 1.5% part of DNA sequences: there is far more information that researchers haven’t yet investigated that could potentially play a significant role in how the human genome evolves and gives shape to each particular human.
As each human’s genetic blueprint is unique, personalised medicine would deliver unprecedented benefits compared to conventional generalised therapies. If electronics engineers want to provide the best solutions to such a delicate subject, they need to observe expanded diligence, care, and caution in how they handle sensitive biological data.
In the following subsections, we look at some of the many considerations that researchers need to take into account.
An infographic that refers to a DNA microarray technology being used as a clinical diagnostics tool for cancer research
Image credit: NHGRI
DNA Microarrays
DNA microarrays are an inexpensive way to analyse thousands of gene sequences from various locations at once. DNA microarrays, also known as DNA microchips, are able to analyse a large volume of data and are therefore used as clinical diagnostics tools. (One example of these tools is in cancer studies, owing to the DNA microarrays’ ability to visually present the differences between normal tissue and tumour tissue.)
Genome Sequencing
The primary method for collecting genome data is genome sequencing (which has most recently been carried out with agarose gel electrophoresis). Genome sequencing basically involves reading the genetic letters A, C, G, and T (respectively: adenine, cytosine, guanine, and thymine) that comprise the DNA. But uncovering the sense of what are essentially the ‘words’ and the ‘sentences’ that the four letters make is where it becomes more complex.
An annotated illustration that refers to an automated DNA sequencing process that involves capillary electrophoresis
Image credit: Memorial University
Modern genome sequencing is done with various automatic sequencing machines, including robotic capillary sequencers. Preparing DNA for ‘reading’ is a challenge tackled by chemical modification and fluorescent tagging of batches, which are later scanned by a laser and read by a computer that ultimately presents to the researcher(s) the given DNA sequence.
Finishing Methods
For researchers, interpreting the DNA sequences is the more challenging part of bioinformatics. Such a task is called ‘finishing’ and involves the removal of errors and ambiguities in the sequence. Next-generation sequencers include in-silico finishing by using a computer simulation.
Computer methods are used not only to translate biological data into a computer language but also to search, compare, and classify sequences. One example is BLAST (Basic Local Alignment Search Tool), a searchable database under the NCBI (National Center for Biotechnology Information), which helps genome researchers to collaborate and conduct genomics analysis.
Translating Other Biological Data
DNA is only a part (albeit an essential one) of the required biological data in bioinformatics. Physiological processes also produce a data deluge. Data such as those relevant to ECG (electrocardiogram), blood pressure, skin conductance, and so on, has been measured and analysed for decades. More recently, physiological data has been gathered with eye-tracking diagnostics tools or body-mounted sensors.
Such sensors can, however, cause discomfort to the participant. Even when using lesser invasive methods for bio-data collection, such as video recording or EMG (electromyography), making sense and analysing has been an issue until improvements made with machine learning recently.
‘Omics’ Data: a Holistic View of a Biological System
The latest research on the human genome and other biological information include ‘omics’ data, or data collected via various biological methods from individual genomes and molecular profiles. Analysing such data, in addition to the relevant electronic medical records, could help provide a more comprehensive health profile of a patient and ultimately allow them to have a disease-treating approach that is tailored to them.
Addressing Data Challenges with Machine Learning
Drawing conclusions about various sizes and formats of biological data is a serious challenge. But now we have new computational analysis and bio-data handling methods that, when used responsibly, could contribute to a vast improvement of human health. A more accurate application of AI in medtech is desirable, especially when dealing with deadly diseases.
An illustration of various ‘omics’ sequencing techniques that are achieved with machine learning, including phenotype data/metadata analysis
Image credit: Genome Biology, BioMedCentral
Indeed, bioinformatics is a new discipline that certainly needs an advanced computational approach, and this is where machine learning comes in.
Machine learning is based on sophisticated statistics that improve with experience. Therefore, the importance of harmonised data handling is key to building experience and recognising critical data patterns. Plus, data scientists must continue to carefully consider the formats and methods for data transfer, data format conversion, algorithms, and data output methods.
Both the Pitfalls and Potential of Bioinformatics
Already, the tremendous amount of existing biological data holds plenty of new prospects. However, there is a highly complex input-throughput-output relationship among data that goes into any and all bioinformatics architecture.
Indeed, many fringe areas between bioinformatics and bioelectronics are subject to rigorous critical review and for good reason. Implantables and wearables must be error-proof—increasingly so for therapeutic and pharmacological purposes—and they therefore require all bioinformatics researchers to err on the side of caution.
But nonetheless, the whole research area is gathering speed, and combined breakthroughs from nanoelectronics and bioinformatics will hopefully be saving lives soon enough.