Questions tagged [dataframe]

A data frame is a 2D tabular data structure. Usually, it contains data where rows are observations and columns are variables and are allowed to be of different types (as distinct from an array or matrix). While "data frame" or "dataframe" is the term used for this concept in several languages (R, Apache Spark, deedle, Maple, the pandas library in Python and the DataFrames library in Julia), "table" is the term used in MATLAB and SQL.

122,637 questions
0votes
0answers
6views

How do I add up rows in PANDAS and keep it at the bottom within my statement?

I am currently trying to add totals to the bottom of my columns in preparation for my dataframes to be exported to excel/CSV files. I wanted to know what the best way to do this is. I had been using ''...
0votes
0answers
8views

Groupby column and create lists for other two columns - PySpark

I have a dataframe of Pyspark which looks like this. Id timestamp col1 col2 abc 789 0 1 def 456 ...
1vote
2answers
15views

Insert new column into dataframe as a VARIABLE

I used the following to obtain state using zipcode from uszipcode import SearchEngine engine = SearchEngine() zipcode = engine.by_zipcode(ZIPCODE) **st** = print(zipcode.state) Now I am trying to ...
  • 13
0votes
0answers
12views

Error: 3 columns passed, passed data had 1 columns (Creating DataFrame from lists)

I wrote a function that outputs 3 lists and want to make those lists each a column in a dataframe. The function returns a tuple of 3 lists, containing text or lists of text. Here is the function: def ...
1vote
1answer
19views

How to set second and following occurrences of 0 to NaN in python

I am trying to figure out how to set all second occurrences of 0 in my dataframe to NaN in python. However, this is a tranposed dataframe so 0 would occur across the columns. To explain, I have the ...
0votes
1answer
13views

Pandas replace() string with int "Cannot set non-string value in StringArray"

I'm trying to replace strings with integers in a pandas dataframe. I've already visited here but the solution doesn't work. Reprex: import pandas as pd pd.__version__ > '1.4.1' test = pd....
  • 193
0votes
0answers
15views

How to avoid duplication when gathering (twice) from wide to long format in R?

In dataset df I want to calculate the variance between samples as shown in this examples ###Only caclating in wide format set.seed(111) df <- data.frame(month = rep(c("A","B",&...
0votes
2answers
20views

Add a column from a function of 2 other columns in PySpark

I have two columns in a data frame df in PySpark: | features | center | +----------+----------+ | [0,1,0] | [1.5,2,1]| | [5,7,6] | [10,7,7] | I want to create a function which calculates the ...
0votes
0answers
19views

How to print only specific columns based on a cell condition in Python Pandas (dataframe)?

I have to print the columns "COrder_ID" and "Supplier_Name" only for the rows, which contain the word "Expired" in the column "Reviews". I am using the ...
0votes
0answers
19views

Is there any python or pandas function to give me values that aren't shared between two lists and also tells me which list they came from?

I have 2 dataframes and I need to get only the rows where the name value(GN) is unique(not in the other dataframe). This is what I have come up with so far but I would like to know if there are any ...
0votes
1answer
14views

Python - count and Difference data frames

I have two data frames about occupation in industry in 2005 and 2006. I would like to create a df using the column with the result of the changed of these years, if it growth or decreased. Here is a ...
0votes
1answer
20views

Pandas Re-ordering rows when concatenating [closed]

I've manipulated/ordered two dataframes using pandas, and they print out as expected. However, when using pd.concat([df1, df2], axis=1) it concatenates by column, but the df2 order is lost. The first ...
0votes
0answers
23views

trying to get ratio of numbers in a rolling window. TypeError: unsupported operand type(s) for /: 'Rolling' and 'int'

im trying to get the ratio of numbers of a certain column that are above 0 in a rolling window as a new column. i tried the following: df['ratio'] = pd.DataFrame.rolling(df[df['column']>0].count(),...
0votes
1answer
20views

Sorting specific values in a dataframe column based on another column in the dataframe

I'm trying to write a python code to summarize a number of f1 statistic csv files. Currently what I'm trying to summarize is the top 20 f1 drivers with the most wins as well as the top 10 nations with ...
-1votes
0answers
13views

How to perform enrichment analysis using enrichKEGG with KEGGREST

package ‘KEGG.db’ is not available for Bioconductor version '3.15' And I need it to perform my enrichment analysis. If anyone knows how we can perform enrichKEGG with KEGGRIST can you please write the ...

153050per page