Analyzing Data with Python: Counting how many times each value in a CSV occurs

Data analysis is an essential aspect of many fields, from business and research to sports and education. Python, with its versatile libraries, is a popular choice for data analysis tasks. In this blog post, we’ll explore a Python script that reads data from a CSV file and counts the occurrences of each value in 3 columns. The script can be a valuable tool for gaining insights into educational information datasets.

#!/bin/python

import csv

# Create empty dictionaries
d_university = dict()
d_country = dict()
d_region = dict()

with open('XtremeScores.csv') as f:
    reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)
    # Loop through each line of the file
    for line in reader:
        word = line[6]
        # Check if the word is already in dictionary
        if word in d_region:
            # Increment count of word by 1
            d_region[word] = d_region[word] + 1
        else:
            # Add the word to dictionary with count 1
            d_region[word] = 1

        word = line[5]
        # Check if the word is already in dictionary
        if word in d_country:
            # Increment count of word by 1
            d_country[word] = d_country[word] + 1
        else:
            # Add the word to dictionary with count 1
            d_country[word] = 1

        word = line[4]
        # Check if the word is already in dictionary
        if word in d_university:
            # Increment count of word by 1
            d_university[word] = d_university[word] + 1
        else:
            # Add the word to dictionary with count 1
            d_university[word] = 1

sorted_university = sorted(d_university.items(), key=lambda x:x[1], reverse=True)
print(sorted_university[:10])
sorted_country = sorted(d_country.items(), key=lambda x:x[1], reverse=True)
print(sorted_country[:10])
sorted_region = sorted(d_region.items(), key=lambda x:x[1], reverse=True)
print(sorted_region[:10])

Let’s break down the code step by step:

#!/bin/python

import csv

The script starts by importing the csv module, which is essential for handling comma-separated value (CSV) files.

# Create empty dictionaries
d_university = dict()
d_country = dict()
d_region = dict()

Three empty dictionaries, d_university, d_country, and d_region, are created. These dictionaries will be used to store the counts of universities, countries, and regions, respectively.

with open('XtremeScores.csv') as f:
    reader = csv.reader(f, delimiter=',', quoting=csv.QUOTE_NONE)

The script opens a CSV file named ‘XtremeScores.csv’ using a with statement. The csv.reader object is used to read the contents of the file. We specify that the delimiter is a comma (,), and we don’t want to perform any special quoting.

    # Loop through each line of the file
    for line in reader:

A for loop iterates through each line in the CSV file. The variable line contains the data for each row in the file.

        word = line[6]

The code extracts the word at index 6 (zero-based index) from the current line. In this context, it typically represents a region.

        # Check if the word is already in the dictionary
        if word in d_region:
            # Increment count of word by 1
            d_region[word] = d_region[word] + 1
        else:
            # Add the word to the dictionary with count 1
            d_region[word] = 1

The code checks if the extracted word (representing a region) is already present in the d_region dictionary. If it is, it increments the count by 1. If not, it adds the word to the dictionary with a count of 1. This process counts the occurrences of each region in the dataset.

The code repeats the same process for words representing countries (at index 5) and universities (at index 4), using the d_country and d_university dictionaries, respectively.

sorted_university = sorted(d_university.items(), key=lambda x:x[1], reverse=True)
print(sorted_university[:10])

After counting the universities, the code sorts them in descending order of frequency and prints the top 10 universities based on their occurrence.

sorted_country = sorted(d_country.items(), key=lambda x:x[1], reverse=True)
print(sorted_country[:10])

Similarly, it does the same for countries and prints the top 10 countries.

sorted_region = sorted(d_region.items(), key=lambda x:x[1], reverse=True)
print(sorted_region[:10])

Finally, it sorts and prints the top 10 regions based on their occurrence.

In summary, this Python script provides a simple but effective way to analyze data in a CSV file, specifically counting universities, countries, and regions. It leverages dictionaries to maintain counts and uses the csv module for reading data from the file. This can be a useful starting point for more advanced data analysis tasks and visualization in Python.


How to “group by” and sum or count in LibreOffice Calc (Excel)

In the world of spreadsheet applications, LibreOffice Calc stands out as a versatile and powerful tool for managing data. It offers a wide array of features that can help you organize, analyze, and make sense of your data. One of these features, often underutilized, is the Subtotals functionality. In this blog post, we’ll explore how to use the Subtotals functionality in the Data menu of LibreOffice Calc to count how many times each item is repeated in a dataset. This is particularly useful when working with large datasets or lists, as it allows you to create summary reports without the need for complex formulas or manual counting.

Preparing Your Data

Start by opening LibreOffice Calc and loading the dataset you want to analyze. Ensure that your data is organized in columns and that each item you want to count is in a separate column. For example, if you have a list of products, each product name should be in its own column.

Sorting Your Data

To use the Subtotals functionality effectively, your data needs to be sorted by the column containing the items you want to count. To sort your data:

  • Select the entire dataset by clicking and dragging your mouse.
  • Go to the “Data” menu, and then click on “Sort.”

Sort Data

  • In the “Sort Criteria” dialog box, select the column containing the items you want to count.
  • Choose the sorting order (ascending or descending), and click “OK.”

Your data is now sorted and ready for subtotal analysis.

Using the Subtotals Functionality

With your data sorted, you can now use the Subtotals functionality:

  • Select the entire dataset again.
  • Go to the “Data” menu and click on “Subtotals.”

Subtotals

In the “Subtotals” dialog box, you’ll see options for grouping and summarizing your data. By default, it may suggest using the first column for grouping, which is what you want in most cases.

Subtotals Dialog

  • In the “Function” dropdown, choose the type of summary you want, which is “Count” in this case.
  • Make sure that the “Replace current subtotals” option is selected.
  • Click “OK.”

LibreOffice Calc will now calculate the subtotal counts for each item in your dataset and insert them into your spreadsheet. It will also group items together and provide an outline to help you navigate the summary.

Subtotals Result

The Subtotals functionality creates a summary of your data by grouping items and counting them. You can expand and collapse these groups using the outlined symbols to the left of the spreadsheet. This allows you to view the summary data in a more organized manner.

The Subtotals functionality in LibreOffice Calc is a powerful tool for analyzing data and generating summary reports. Whether you’re working with product lists, customer data, or any other dataset, Subtotals can help you count how many times each item is repeated without the need for complex formulas or manual counting. By following the steps outlined in this blog post, you can harness the full potential of LibreOffice Calc and make your data analysis tasks more efficient and accurate. Give it a try, and you’ll be amazed at how Subtotals can streamline your data analysis workflow.


How to Manually Set Your Starlink Dish to Stow Mode

Starlink has emerged as a beacon of hope for many remote and underserved regions in a world increasingly reliant on high-speed internet connectivity. However, there might be situations where you need to set your Starlink dish to stow mode, but you don’t have access to it via your phone due to connectivity issues or because you’ve enabled bypass mode. In such cases, you can manually stow your Starlink dish by following these simple steps.

Step 1: Power Off the Starlink Dish

The first step is to turn off the power to your Starlink dish. This can be done by unplugging it or switching it off using the provided power source.

Step 2: Remove the Base

Depending on your installation, your Starlink dish is securely attached to a base, which can be metallic or any other kind of base. You’ll need to detach the dish from this base. This is typically done by removing bolts, screws, or other fasteners holding the dish in place.

Step 3: Place the Dish Upside Down

After you’ve separated the dish from its base, place it upside down on a clean and flat surface. Ensure the surface is level to allow a smooth transition to stow mode.

Step 4: Power On the Starlink Dish

Now, power on the Starlink dish by reconnecting it to its power source. The dish will go through its initialization process, but when it detects that it’s in an unusual position (upside down), it will initiate the stow mode.

Step 5: Wait for the Dish to Stow

After a few seconds (usually less than a minute), the Starlink dish will automatically move to stow mode. During this process, it will adjust its position and fold into a more compact configuration, which is ideal for storage and transportation.

By following these steps, you can manually set your Starlink dish to stow mode even when you can’t access it through your phone due to connectivity issues or because you’ve enabled bypass mode. This is particularly helpful when storing your Starlink equipment safely or transporting it to a different location.

Remember that these steps are intended for emergencies or specific use cases. Ideally, you should use the Starlink app or web interface to manage your dish. However, manually setting your Starlink dish to stow mode can be a handy backup plan when technology doesn’t cooperate.


Using Face Recognition in Python to Extract Faces from Images

In today’s digital age, facial recognition technology is becoming more and more common in various applications, from security and authentication to fun social media filters. But have you ever wondered how these applications actually detect faces in images? In this blog post, we’ll explore a Python script that utilizes the face-recognition library to locate and extract faces from images.

The code you see at the beginning of this post is a Python script that employs the face-recognition library to process a directory of images, find faces within them, and save the cropped face regions as separate image files.

Prerequisites

Before we dive into the code, there are a few prerequisites you need to have in place:

  1. Python: You should have Python installed on your system.
  2. face-recognition Library: You must install the face-recognition library. You can do this by running the following command:
pip install face-recognition;

Understanding the Code

Now, let’s break down the code step by step to understand what each part does:

#!/bin/python

from PIL import Image
import face_recognition
import sys
import os

inputDirectory = sys.argv[1]
outputDirectory = sys.argv[2]

  • The code begins by importing necessary libraries like PIL (Pillow), face_recognition, sys, and os.
  • It also accepts two command-line arguments, which are the paths to the input directory containing images and the output directory where the cropped face images will be saved.
for filename in os.listdir(inputDirectory):
    path = os.path.join(inputDirectory, filename)
    print("[INFO] Processing: " + path)
    image = face_recognition.load_image_file(path)
    faces = face_recognition.face_locations(image, model="cnn")
    print("[INFO] Found {0} Faces.".format(len(faces)))

  • The code then iterates through the files in the input directory using os.listdir(). For each file, it constructs the full path to the image.
  • It loads the image using face_recognition.load_image_file(path).
  • The face_recognition.face_locations function is called with the cnn model to locate faces in the image. The cnn model is more accurate than the default HOG-based model.
  • The number of detected faces is printed for each image.
    for (top, right, bottom, left) in faces:
        print("A face is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}".format(top, left, bottom, right))
        face_image = image[top:bottom, left:right]
        pil_image = Image.fromarray(face_image)
        pil_image.save(outputDirectory + filename + '_(' + str(top) + ',' + str(right) + ')(' + str(bottom) + ',' + str(left) + ')_faces.jpg')

  • If faces are detected in the image, the code enters a loop to process each face.
  • It prints the pixel locations of the detected face.
  • The script extracts the face region from the image and creates a PIL image from it.
  • Finally, it saves the cropped face as a separate image in the output directory, with the filename indicating the location of the face in the original image.

Running the Code

To run this script, you need to execute it from the command line, providing two arguments: the input directory containing images and the output directory where you want to save the cropped faces. Here’s an example of how you might run the script:

python face_extraction.py input_images/ output_faces/;

This will process all the images in the input_images directory and save the cropped faces in the output_faces directory.

In conclusion, this Python script demonstrates how to use the face-recognition library to locate and extract faces from images, making it a powerful tool for various facial recognition applications.

Full Code

#!/bin/python

# Need to install the following:
# pip install face-recognition

from PIL import Image
import face_recognition
import sys
import os

inputDirectory = sys.argv[1];
outputDirectory = sys.argv[2];

for filename in os.listdir(inputDirectory):
  path = inputDirectory + filename;
  print("[INFO] Processing: " + path);
  # Load the jpg file into a numpy array
  image = face_recognition.load_image_file(path)
  # Find all the faces in the image using the default HOG-based model.
  # This method is fairly accurate, but not as accurate as the CNN model and not GPU accelerated.
  #faces = face_recognition.face_locations(image)
  faces = face_recognition.face_locations(image, model="cnn")

  print("[INFO] Found {0} Faces.".format(len(faces)));

  for (top, right, bottom, left) in faces:
    #print("[INFO] Object found. Saving locally.");
    print("A face is located at pixel location Top: {}, Left: {}, Bottom: {}, Right: {}".format(top, left, bottom, right))
    face_image = image[top:bottom, left:right]
    pil_image = Image.fromarray(face_image)
    pil_image.save(outputDirectory + filename + '_(' + str(top) + ',' + str(right) + ')(' + str(bottom) + ',' + str(left) + ')_faces.jpg')