Opinion

Bharat needs Demographic Data Engineering & Demographic Data Intelligence to replace Census & serve a larger purpose – A Computer Engineer’s way of looking at things related to Indigenous Faiths – Part 1

People who look at census data, in general, think that those with Hindu names are Hindus & also are practicing Hindus, this is the way census operates in India

Venkat Sudheendra

September 29, 2021

People who look at census data, in general, think that those with Hindu names are Hindus & also are practicing Hindus, this is the way census operates in India

Key findings of the religious composition of India

Every ten years we see that the Government is mandated to conduct the census, a population registry for all its residents or to be specific its citizens.

The one question that comes up during the census process and also after every census is the religious composition of the population in India. The primary reason why this question has popped up in recent times is because of the ‘alleged’ unabated unchecked conversion of native indigenous dharmic faiths to the Abrahamic faiths.

The Indigenous faiths, especially the Hindus are eager to know the actual numbers because they have been the major victim of reckless unchecked forced or induced conversions by sops or monetary benefits or any other insidious means that have an ulterior objective.

The 2011 census showed that for the first time since independence Hindu population had fallen below 80 percent i.e. to 79.8% (not taking into account the other indigenous faiths i.e. the Jains, Buddhists and, the Sikhs) from the previous number of 80.5% in the 2001 census.

This percentage fall might seem insignificant but in actual numbers, this is a huge figure (given the size of India’s population) & perhaps equivalent to the population size of a North East State.

This fall needs to be analyzed by historical trends, gender-wise, region-wise and considering age group, and much more and there are so many gaps in the census data which even an expert analyst can’t explain.

Consider a problem statement:-

What is the number of people of Indigenous faiths across 2 genders who have gotten converted over the past 10 years, year on year, and what percentage of those contributes to the growth of the other faith in the country or a specific region or district, etc.?

Advancing the question to the next level:-

Is there any correlation between the income of people of Indigenous faith and their propensity to convert to an alien faith?

The above questions cannot be solved using the census data although the census captures the religious profile and the gender profile. The reason is primarily that the data that is captured in the census is discrete and not continuous. It’s one-time static information that holds less value in modern times.

In simple terms, it presumes that ‘Ravi Kumar’ & ‘Ranjani’ (random Hindu names), even if they are practicing Christians, are considered part of the Indigenous faith since birth as long as they voluntarily disclose that he/ she is a Christian. There may be so many ‘Ranjanis’ and ‘Ravi Kumars’ hiding their true identity of ‘Ranjani Christopher’ and ‘Ravikumar Fernandez’.

The placement of Ravikumar and Ranjani as part of the Indigenous faith population is one of the prime reasons why Indigenous faiths’ numbers are inflated. The native Hindu society also thinks that ‘all is well’ going by the inflated number not realizing that there is a silent demographic shift through covert means called conversions.

People who look at the census (raw data) in general think that those with Hindu names are Hindus and also are practicing Hindus, this is the way census operates in this country.
It is possible that there are Indigenous people with Indigenous names but practicing Abrahamic faith and counted as part of the Indigenous faith, this is happening in every census and there is no way to check or verify or even validate them.

Also, those people who have been converted from Indigenous faiths don’t divulge that they are practicing an alien faith to not let go of the reservation benefits that are entitled to certain sections of the Indigenous faith, this is not just fraud but also denies rightful benefit transfer to the deserving.

To address various such problems with regards to the religious demographic profiles and the discrepancies with regards to the data capture we need to replace census with what I term as ‘Demographic Data Engineering’.

Before we understand what is this Demographic Data Engineering, we need to have an understanding of Data Engineering in an overview.[1]

Data Engineering is similar to Software Engineering where the result is clean, streamlined, and organized data that can be used for all sorts of data and scientific analysis.

The main requirement for Data Engineering is a source of a steady stream of data that is captured in its entirety with both quantifiable and non-quantifiable aspects.

To give a simple example – If 10 people go to a supermarket and each of them buys 1 item (presuming that each item is unique), then there will be 10 records of those 10 people in the database of that supermarket which captures the item, its category, its price, its quantity, when and who bought it. Now, assume if the supermarket has 3 more branches and 30 people go to those branches the next day, this means 30 new entries are made in their local databases. Here the quantifiable aspects are the price and quantity and the ‘non-quantifiable’ are the categories, the customer, the locality, and the time. In Data Engineering terms quantifiable aspects are called “Facts” and non-quantifiable are called “Dimensions”.

So a continuous stream of data or records, based on the date of transaction, is captured at a simple store which can then be cleaned, compiled, loaded into a larger repository, and then be analyzed to find buying patterns, the demographics of that area, the propensity towards an item and many more.

This process of extracting, cleaning, organizing, compiling, and analyzing a steady stream of data using a framework is commonly called ETL (Extract Transform Load), a major part of Data Engineering. Data Engineering is also the larger study that encompasses subjects like Data Warehousing & Data Mining.[2][3]

Only with a steady stream or continuous data loaded from time to time, we will have sufficient data to substantially validate several hypotheses which statisticians would need to validate with different statistical models and Machine Learning Models. Mere capture of data at one particular instance adds no value.

Currently, the census is doing exactly what it should not be supposed to do, a one-time capture as the population data has several dimensions that are continuously changing.

A one-time capture of census data will have so many gaps, for example, In the age group of 25-35 (approx. 250 million) many might be classified as ‘unmarried’ during the census, and post census there is a very high likelihood that they have gotten married, which means in just 5 years the data of a population size of 250 million or at least 50% of that 250 million gets altered and thus making the census data irrelevant concerning marital status in just 1-5 years.

In simple terms in 5 years, the 2021 data capture would become stale and only a rough projection can be made using forecasting methods and not give accurate data.

At least in the case of marital status projections can come correct, leaving out astrology and horoscope (on a lighter note) but in the case of faith or religious denomination it’s a tricky thing and there is no way to check it.

To solve the above problem, we can leverage several Data Engineering concepts and implement them to see how is the trend, how is the behavior and movement of faith, etc.
In Data Engineering, there is a concept called as Slowly Changing Dimension (SCD), there are various types, like SCD 0, SCD 1, SCD 2, SCD 3, and recently we have SCD 4, 5 & 6. In simple terms, the SCD is to show a dimension that will change over time.

Eg. Consider a person named Sai Deepak (Random name and nothing associated with Advocate J Sai Deepak, because J Sai Deepak told that I have “put him in a lot of trouble” despite him being a Supreme Court Advocate and me after all an Analyst, so it’s not him) and we want to analyze how he has moved places since birth, basically to see his location as a Dimension.

Assume he was born in Hyderabad and then studied Engineering in Chennai and then now living in New Delhi and previously he was working as Vakil in Delhi and now an Author in the same place.

According to SCD 0, the Data captured would not have any history associated with it i.e. Sai Deepak is a Vakil in Delhi, that’s all.

SCD 0
Name	Location	Profession
Sai Deepak	New Delhi	Lawyer

As per SCD 1, the most recent location and Profession get replaced, i.e. he became an Author recently in New Delhi so the profession alone gets replaced and nothing apart from that.

SCD 1
Name	Location	Profession
Sai Deepak	New Delhi	Author

According to SCD 2, the historical data of the person based on location and Profession gets captured.

SCD 2
Name	Location	Start	End	Profession	Active Flag(on Location)
Sai Deepak	New Delhi	2021	–	Author	Yes
Sai Deepak	New Delhi	2010	–	Lawyer	Yes
Sai Deepak	Kharagpur	2005	2010	Student Vakil	No
Sai Deepak	Chennai	2001	2005	Engg Student	No
Sai Deepak	Hyderabad	1989	2001	Sch. Student	No

Now as per SCD 3, the data for Sai Deepak would look like this, it captures the previous location of that person.

SCD 3
Name	Location	Previous Location	Profession
Sai Deepak	New Delhi	New Delhi	Author
Sai Deepak	New Delhi	New Delhi	Lawyer
Sai Deepak	Kharagpur	Chennai	Student Vakil
Sai Deepak	Chennai	Hyderabad	Engg Student
Sai Deepak	Hyderabad	–	Sch. Student

Finally, SCD 6, which is the most extensive data capture would look like SCD 1 + 2 + 3

SCD 6
Name	Location	Previous Location	Profession	Start	End	Active (on Location)
SaiDeepak	New Delhi	New Delhi	Author	2021	–	Yes
SaiDeepak	New Delhi	New Delhi	Lawyer	2010	–	Yes
SaiDeepak	Kharagpur	Chennai	Student Vakil	2006	2009	No
SaiDeepak	Chennai	Hyderabad	Engg.Student	2002	2006	No
SaiDeepak	Hyderabad	–	Sch. Student	1990	2002	No

If we have demographic data organized like the above one for every individual capturing religious profile along with income, the questions which we wanted to address at the beginning of this article can be easily solved.

Although SCD and Data Engineering sound like a modern concept it has a historical root in Bharat and its cultural essence in a different form. The Pandits in Haridwar maintained what is called a family genealogy, it maintains a record of the male members born in the family and who are they married to, etc.

Every time a Hindu visits Haridwar to perform death rites or Pithrudhan, the pandits recorded the family’s name, their native, their gotra, whom are they married to, the male members and total members, etc. and this has been going on for over 20 generations spanning over 5 centuries.[4]

This method is similar to what is described as Slowly Changing Dimension, where the dimension, in this case, is the family name and gotra. We can call this ‘Traditional SCD.
We must also note that this is not the same as census because census captures the record but doesn’t verify if the record is correct or not, i.e. whether Ranjani or Ravikumar are Hindus.

Census will capture ‘Ranjani Christopher’ and ‘Ravikumar Fernandez’ as Hindus whereas they would get caught red-handed in a ‘traditional SCD’ setup because they would not be able to tell the gotra or will they ever perform the Pitrudhaan in Haridwar.

Now, the main question is how do we implement this? – Along with census is not the answer, The answer lies in digitizing the pilgrims of the major Hindu & Jain temples and their mutts. By Digitizing, we mean that we capture all the necessary dimensions corresponding to the pilgrims, every time they enter the premise of the temple.

The Tirumala Tirupati Devasthanam (TTD) online booking database would have somewhat sufficient information regarding pilgrims because it is validated using some Valid Identity Card but that data is not completely sufficient because a vast majority of the pilgrims still go without any online form and that is the same case with every major temple.

The only challenge is that the first time when the data is captured, it has to be extremely cautious by making sure that other faiths do not intrude into the temple registry.

The reason why we need to compulsorily digitize the major ones is that they are the main or major source to validate the condition called ‘Practicing Hindu or Jain’. These temples will allow only Hindus and not any other faith.

Along with this, we also need to understand that there are security aspects that are involved concerning the Indigenous faiths, their temples, and properties. No Hindu can get inside Mecca or Vatican church but countless people have made their way inside several Hindu temples faking their identity.

The Indigenous faiths need to understand that if Bharat has to sustain democracy then the numbers of Indigenous faiths matter and only when numbers are properly enumerated and continuously validated and made well aware to the public, the democracy of this country will be strong.

The process of visualizing the streamlined validated demographic data and making it publicly available for analysis and insights is what I term as “Demographic Data Intelligence” which is the computer science equivalent of Business Intelligence.

The Hindu society if it needs to make a solid case that there are serious cases of coercive and forced conversions then it needs data and numbers to justify it and the first step towards proving that there is demographic shift is to do a thorough cautious data capture at all Hindu places of worship and create pipelines of those streams of demographic data.

In the next part, we will see the policy aspect which needs to be present and how this is to be implemented correctly so that we can prove hypotheses such as “Is there a correlation between the income and the propensity to change faith? ”

We will also see what is called CI/ CD pipelines, i.e. Continuous Integration and Continuous Development, from a Data Engineering perspective and how this can be implemented here for demographic profiling of pilgrims.

To be continued…

Note:
1. Text in Blue points to additional data on the topic.
2. The views expressed here are those of the author and do not necessarily represent or reflect the views of PGurus.

References:

[1] Data Engineering and Its Main Concepts: Explaining the Data Pipeline, Data Warehouse, and Data Engineer Role – Aug 26, 2021, Altexsoft

[2] What are Slowly Changing Dimensions? – DW4U

[3] What is Data Warehousing? Concepts, Features, and Examples – Nov 06, 2020, Astera

[4] Rooting out a Hindu family history the traditional way – Jun 14, 2012, BBC