# count

## Analyzing Data with Python: Counting how many times each value in a CSV occurs

Data analysis is an essential aspect of many fields, from business and research to sports and education. Python, with its versatile libraries, is a popular choice for data analysis tasks. In this blog post, we’ll explore a Python script that reads data from a CSV file and counts the occurrences of each value in 3 columns. The script can be a valuable tool for gaining insights into educational information datasets.

```#!/bin/python

import csv

# Create empty dictionaries
d_university = dict()
d_country = dict()
d_region = dict()

with open('XtremeScores.csv') as f:
# Loop through each line of the file
word = line[6]
# Check if the word is already in dictionary
if word in d_region:
# Increment count of word by 1
d_region[word] = d_region[word] + 1
else:
# Add the word to dictionary with count 1
d_region[word] = 1

word = line[5]
# Check if the word is already in dictionary
if word in d_country:
# Increment count of word by 1
d_country[word] = d_country[word] + 1
else:
# Add the word to dictionary with count 1
d_country[word] = 1

word = line[4]
# Check if the word is already in dictionary
if word in d_university:
# Increment count of word by 1
d_university[word] = d_university[word] + 1
else:
# Add the word to dictionary with count 1
d_university[word] = 1

sorted_university = sorted(d_university.items(), key=lambda x:x[1], reverse=True)
print(sorted_university[:10])
sorted_country = sorted(d_country.items(), key=lambda x:x[1], reverse=True)
print(sorted_country[:10])
sorted_region = sorted(d_region.items(), key=lambda x:x[1], reverse=True)
print(sorted_region[:10])

```

Let’s break down the code step by step:

```#!/bin/python

import csv
```

The script starts by importing the `csv` module, which is essential for handling comma-separated value (CSV) files.

```# Create empty dictionaries
d_university = dict()
d_country = dict()
d_region = dict()
```

Three empty dictionaries, `d_university`, `d_country`, and `d_region`, are created. These dictionaries will be used to store the counts of universities, countries, and regions, respectively.

```with open('XtremeScores.csv') as f:
```

The script opens a CSV file named ‘XtremeScores.csv’ using a `with` statement. The `csv.reader` object is used to read the contents of the file. We specify that the delimiter is a comma (`,`), and we don’t want to perform any special quoting.

```    # Loop through each line of the file
```

A `for` loop iterates through each line in the CSV file. The variable `line` contains the data for each row in the file.

```        word = line[6]
```

The code extracts the word at index 6 (zero-based index) from the current line. In this context, it typically represents a region.

```        # Check if the word is already in the dictionary
if word in d_region:
# Increment count of word by 1
d_region[word] = d_region[word] + 1
else:
# Add the word to the dictionary with count 1
d_region[word] = 1
```

The code checks if the extracted word (representing a region) is already present in the `d_region` dictionary. If it is, it increments the count by 1. If not, it adds the word to the dictionary with a count of 1. This process counts the occurrences of each region in the dataset.

The code repeats the same process for words representing countries (at index 5) and universities (at index 4), using the `d_country` and `d_university` dictionaries, respectively.

```sorted_university = sorted(d_university.items(), key=lambda x:x[1], reverse=True)
print(sorted_university[:10])
```

After counting the universities, the code sorts them in descending order of frequency and prints the top 10 universities based on their occurrence.

```sorted_country = sorted(d_country.items(), key=lambda x:x[1], reverse=True)
print(sorted_country[:10])
```

Similarly, it does the same for countries and prints the top 10 countries.

```sorted_region = sorted(d_region.items(), key=lambda x:x[1], reverse=True)
print(sorted_region[:10])
```

Finally, it sorts and prints the top 10 regions based on their occurrence.

In summary, this Python script provides a simple but effective way to analyze data in a CSV file, specifically counting universities, countries, and regions. It leverages dictionaries to maintain counts and uses the `csv` module for reading data from the file. This can be a useful starting point for more advanced data analysis tasks and visualization in Python.

## Count how many submissions per score

The database contained several contests, each contest contained several challenges and any competitor could make multiple submissions.
We wanted to extract a couple of charts showing

• how many submissions we had per score and
• how many submissions we had per score while filtering out the best submission (max score) per contestant per challenge per contest

### The following code will return the number of submissions per score per challenge per contest.

```
SELECT contest_id, challenge_id, TRUNCATE(score, 1), COUNT(*)
FROM submissions
GROUP BY contest_id, challenge_id, TRUNCATE(score, 1)
ORDER BY contest_id, challenge_id, TRUNCATE(score, 1);

```

### The next one will return the number of submissions per score per challenge per contest while filtering out the best submission (max score) per contestant per challenge per contest:

```
SELECT contest_id, challenge_id, TRUNCATE(max_score, 1), COUNT(*)
FROM
(
SELECT contest_id, challenge_id, competitor_id, MAX(score) AS max_score
FROM submissions
GROUP BY contest_id, challenge_id, competitor_id
) AS max_scores
GROUP BY contest_id, challenge_id, TRUNCATE(max_score, 1)
ORDER BY contest_id, challenge_id, TRUNCATE(max_score, 1);

```

## Excel: A few useful functions

count() Counts all cells that contain numbers only.

counta() Counts all cells that contain any content (both numbers and text) a.k.a. are not empty.

countblank() Counts all cells that are empty.

## Bash: Read a file, line by line

The following script will accept from the keyboard a filename, later it will read it and process it line by line.
In this example we will just number the lines, print them and count the total number of lines in the file.

```#!/bin/sh
echo Enter the Filename to process